Unveiling the online journey with Rax


It is generally believed that there is a lot of valuable information hidden in people's online-behavior data. By uncovering patterns in this behaviour, we could possibly achieve better segmentation and ad targeting, better website personalization, etc. However, looking for patterns in the (online) behavioral data is not easy. It requires being able to handle the temporal aspect of the behavioral data and most existing tools lack expression power in this area. Also, including the temporal dimension of people's behavior in the analysis makes each person unique which makes it hard to come up with a predictive model.

We are setting up a research project whose goal is gaining better insights and better decision making based on online journeys. We are doing this research in cooperation with Pointlogic - a company with over 20 years experience in analysing data, especially for the media and advertising world. We are hoping that by combining our tools (like Rax) and expertise in handling temporal data with Pointlogic's mathematical expertise and knowledge of the advertising world, we can discover new, valuable patterns in the data which will eventually lead to more efficient metrics to be used in predictive modeling of people's online behavior.

For this research project, we're looking for up to 5 partners - organizations that are in possession of online behavioral data, such as clickstreams, online ad impressions and click-throughs, etc., and who would like to use this data for better targeting and personalization. If you think it might be something for you, please leave your email below, and we will contact you.

Enter your email address:

For more details about this research project, read below.

Current situation in online advertising

For online advertising campaigns to be effective, one needs to be able to predict conversion towards certain goals. Goals could be sales but also downloading information, contacting the company, etc. Predicting conversion directly relates to money spent via real-time-bidding, programmatic buying and behavioral targeting.

We observe that behaviour is often summarised by visiting certain pages or by consumer engagement with certain keywords. For example:

  • We’ll target everyone that has been to www.runningshoes.com
  • We’ll put a higher bid for everyone that has been to my website
  • We perceive those that searched for “buying running shoes” as the most important target

The above is executed with a lot of sophistication but we feel that the time-dimension defining behavior is missing with most analyses, for example:

  • Are consumers that first go to website 1 and then to website 2 more likely to convert?
  • Should we focus on consumers that visit many of our online assets in a short period of time?
  • Should we increase our bidding a couple of weeks after a consumer starts their journey showing interest in our products?

However, including the time element is very hard. Dealing with time in data is by itself very complex. What's worse, including the time dimension makes each consumer unique (we can segment those that went to a website but only one consumer went to the website first, stayed there for 12 minutes and went to another website 18 hours later, etc). Compressing the time dimension into a predictive model is certainly an additional mathematical challenge.

The research

We consider two different but related research questions:

  • Phase 1: Can we visualise online journeys that include the time factor and order of activities in a way that provides insights into these journeys (across audience segments)?
  • Phase 2: How can we develop metrics for individual journeys allowing these metrics to be used in predictive modelling?

We believe that the research question for phase 2 would be sufficient to improve media buying. However, phase 1 adds the human touch leading to a deeper understanding of consumer behavior and as such to improved strategies.

Phase 1: Visualising online journeys

  • The goal of the first phase is to develop a methodology to visualise a large number of individual journeys in a way that provides clear insights.
  • We should be able to segment the audience based on conversion and as such easily see how those that convert behave differently.
  • The challenge is to find and visualise behavioral patterns instead of only looking at individual behavior.
  • We will research various methodologies to identify meaningful patterns and to visualise the data.
  • We will provide the charts/visualisation for various goals as requested by our partner. Ownership of these reports will lie with our partner. We will own the underlying methodology.

Phase 2: Metrics for predictive modeling

  • The goal is to develop a metrics framework that generates individual metrics based on one’s unique journey/behavior.
  • The metrics framework will be optimised towards the ability of the metrics to predict if the individual will (or will not) convert on a specific goal. Metrics will be derived from things like: recency of visiting sites, order of the visits, duration of the visits and the duration between visits, etc.
  • Using the visualisation of behavioral patterns (phase 1) we expect to derive the candidate metrics. These will be tested towards various goals and further optimised.
  • We’ll deliver the actual used metrics, how they can be calculated from the data as well as the predictive power of these metrics (compared to currently used metrics).

Necessary data

Foremost we would need data to establish the individual journey, across a variety of different online assets. About three to four months of data should be enough.

We expect data from a single device (e.g., desktop) as a proxy of an individual but would obviously love to distinguish between individuals if possible. However, we don’t need data that allows us to identify specific individuals - the data can be anonymized.

Finally, we would need to get data showing conversions towards certain goals (using an identifier of the user or of the device).

Example: clickstreams from owned websites

For the various relevant (owned) websites and mini-sites clickstream data containing:

  • URL
  • Timestamp
  • User’s identifier (if relevant, for example resulting from a login)
  • Referrer URL
  • Time spent on page (or timestamp of leaving page)
  • Cookie information identifying the specific browser/device

Example: data from ads and banners

For the various paid media we would need, for each impression:

  • Timestamp
  • If available in the cookie: user identification
  • Cookie information identifying the specific browser/device
  • Flag to identify if the user clicked the ad
  • Duration of the impression or timestamp of clicking the ad

If you interested in taking part in this project, leave your email address, and we will contact you:

Enter your email address: