Navigating the Dynamics of Financial Embeddings Over Time

How Graph Representation Learning can capture financial system patterns in a meaningful and robust way


Consider a shopping plaza. There are restaurants, a few big-box stores and national retail chains, a grocery store, local mom-and-pop shops, and several specialty stores.  Each of these merchants fulfills one or many possible needs for a customer. These needs are driven by a complex mixture of physical and social realities, and it is these needs that determine how a customer makes their way through the shopping plaza. After all, customers may be looking for winter coats in December or swimwear in July, shopping at one or the other retailer depending on the season. The shopping trip of a family that just moved into a new home will look different compared to their shopping pattern a year previously, with a wide range of new furniture and home supply stores added to the list of merchants they interact with.

A more timely example is that when COVID-19 started affecting our communities, a sharp delineation between essential versus nonessential businesses and online versus brick and mortar businesses was drawn. Many customers started looking for online products and services, and businesses adapted to the new reality by changing their interaction model with their customers. As customer behavior was greatly impacted by the impacts of quarantine and which products and services were easily accessible, new shopping habits were formed.

In reality, everyone’s relationship with their preferred merchants changes depending on the season, social and family factors or the economic climate of the times. In essence, these relationships are mutable in both time and space. While these relationships are unique to individuals, as a financial institution Capital One is interested in its customers’ changing relationships with merchants.

Accounts as Words, Time as Space

As machine learning practitioners, we ask ourselves how can we better understand these relationships from the vast transactional data over multiple points in time?

The series of transactions someone makes as they shop at Restaurant A, Grocery Store B and Gas Station C in a limited amount of time forms a sequence, A->B->C. The hundreds, if not thousands, of customers that shop at some combination of these stores every day creates a network of sequences of transactions. In this instance, we are not interested in an individual’s relationship with the stores, but rather the collective interaction patterns with these merchants. From a macroscopic perspective, we hypothesize that there are clearly identifiable trends, and most importantly inflection points, represented in these sequences.

To investigate this hypothesis, we project the aforementioned sequences into a latent dense space that represents closeness in terms of similarity of shopping patterns. These representations, namely embeddings, are generated using a highly scalable shallow neural network known as a skip-gram. This was covered in a previous Capital One blogpost on DeepTrax which you can read here: https://www.capitalone.com/tech/machine-learning/learning-embeddings-of-financial-graphs/

One way to view these embeddings is as an efficient mathematical summarization of the (latent) state of an entity relative to all others in that system. In our financial transactions, this means each embedding represents a merchant’s or an account’s state. Of course, people and economies change all the time, impacting that state. Therefore, it becomes imperative to update our understanding of their latent state in a dynamic setting. There are a few ways to do this. In recent work that we presented in a spotlight talk in a ICML Graph Representation Learning (GRL+) , which we will also be sharing at ICAIF (International Conference on AI in Finance) in October, we showed how we train dynamic embeddings of financial transactions and the meta-analysis steps that we perform to extract distinguishable real-world trends from them.

In our research, we seek to answer the following questions:

  • How do embeddings change over time?
  • Do they change seasonally?
  • Do they change in response to large endogenous shocks such as COVID-19?
  • How does one measure if this change is meaningful or not?

Measuring and Interpreting Changes

There is an important difference between meaningful changes and the random rotation of the embedding space inherent to a noisy training process. As we discussed in an earlier blog post, in graphs generated from transactional data, new nodes are continuously coming in while at a slower but still consistent rate other nodes are becoming inactive. New edges are formed while past connections can get eliminated and the full set of timesteps are not available beforehand. To address these issues we use warm start training, meaning we initialize the next timestep’s model with the weights from the previous one, only updating the nodes that obtained new edges. Additionally, we remove “inactive” nodes that have not received new transactions in over 12 months. In this consistently updating setting, we expect the latent representations to update smoothly but meaningfully. How do we measure this representation shift, and most importantly, how does it relate to real semantic shift?

Consider some specific merchants or accounts, which we will call the seed. The seed’s reconstructed 2-hop neighborhood is specifically the top k similar accounts, or merchants, measured by cosine distance in the embedding space along with their neighbor’s top k similar entities. Neighborhoods can have different sizes, and most importantly, can change over time.

clusters of pink, green, orange, and blue dots connected by black arrows

Figure 1: Example of neighborhood formulation around the same seed node over two subsequent timesteps

Figure 1 shows an example of a reconstructed 2-hop neighborhood calculated from two different snapshots of the embedding space. The seed node is depicted in the center, with a larger size.  The further away we move from the seed node and the less central a node is, the smaller the size of the node in this representation. The colors represent modularity based communities inside the neighborhood. As new nodes get added (right figure), some nodes change community membership while others form more connections with one another and end up splitting apart from their original community.

By looking at neighborhood composition and the similarities between the same embeddings across time, we can get a clearer picture of the shifting embedding space and understand if there are global trends driving the representation change, versus what may be an artifact of random wiggling in the latent space. This will help us answer whether a node changes in accordance with its neighbors or whether nodes change drastically and potentially move to another neighborhood of the graph.

For merchants, specifically, another way to aggregate changes to the embeddings over time is by looking at the type of merchant. Oftentimes merchants from the same industry occupy their own neighborhoods -that is, they move together. Airlines and travel industry merchants, for example, can be viewed as a segmentation of all of the merchants.

Shifts and Disturbances Before and During COVID

Recent events related to the Covid-19 pandemic provide an interesting use case in measuring semantic shift. Compared with representation shift, we can draw conclusions on the effectiveness of our proposed framework and gain insights into the pandemic’s effects on merchant-consumer interaction.

Line graph showing red, orange, blue, and green overlapping lines decreasing

Figure 2: Average shift over all time stamps for different merchant categories

First off, when grouping by merchant category, we found that average cosine distance shifts over time differed. This is explained above in line chart Figure 2. Even though peaks and valleys happen around the same timesteps, mostly due to global economic trends, the actual deltas - aka the size of the shift - is different per category. Shifts are often also related with seasonality. For instance, towards the holiday season at the end of each year. When combined with the bar graph in Figure 3 below, demonstrating the number of merchants that had their maximum shift in that month, we see that periods of large cosine shift corresponded to major events - specifically, the upheaval of the early stages of COVID in the right end of the chart.

bar graph with blue bars and red arrows pointing to highest peaks

Figure 3: Normalized count of merchants that exhibit their maximum shift in each timestep

In Figure 4 below we Move into a microscopic view of neighborhood change for individual seed nodes. The seed node’s trajectory in cosine distance between timesteps is depicted in black, while its neighbors in the beginning (2017-12) are depicted with red and the neighbors in later timesteps (2020-03) are depicted in blue.

Two interesting observations can be made here:

  1. Neighbors often follow identical trends with the seed node (e.g. J Crew and Banana Republic).
  2. With the beginning of the pandemic, neighbors are mostly based on similarity of the services (e.g. Equinox and SoulCycle) and less on  co-location of the businesses (e.g. Walmart and nail salons). Since a large portion of transactions were happening purely online, the effect of location in the transactional patterns became less prevalent.
6 cosine shift graphs side by side, with black, purple, red, and pink lines

Figure 4: Cosine shift of seed node (depicted in black) and the respective shifts of their neighbors from the first time stamp (red) and the last one (blue)

Conclusion and Future Work

Ultimately, exploring the way our embeddings change over time gives us insight into the actual change in collective behavior of our customers and merchants. In a d-dimensional latent space we need one more axis to account for time and it is the projection on this axis that will capture the complexity of transaction patterns and the dynamics that are derived from them. The ebb and flow of the embedding space, the speed of change in embeddings, and the aggregated statistics through each time step give us a deeper understanding of our transactions that we didn’t have before.

For more information, please read our KDD and ICML paper, our previous work on financial graph embeddings and our upcoming work to be presented at ICMLA’2020.


Antonia Gogoglou, Engineering Manager, Applied Research

Antonia Gogoglou is an engineering manager in the Applied Research group at Capital One Financial. She leads Graph ML efforts that attempt to bridge state of the art and classical graph mining with financial systems. She holds a PhD in Computer Science and has experience in social network analysis, spam and fraud detection, entity resolution and user segmentation.


DISCLOSURE STATEMENT: © 2020 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

Related Content