Amplifier working file

From: Market Matters

Today’s diverse markets can feel vast and complex. From developments in voice, electronic and algorithmic execution, to regulation’s impact on liquidity, we explore the latest insights.

Subscribe

Trading Insights: All about alternative data

[Music]

Mark Fleming-Williams: There is a universe of data which we're interested in. And so that is three things. One is a long history. I think the longer the history, the more confidence we can have in any signal that we find. Wide coverage. We are interested in something which can tell you about a large number of tickers. And then the third thing, which is extremely important, is point in time. And so from our perspective, even if the data point was wrong, it's very important to have what it was on that day.

Eloise Goulder: Hi, I'm Eloise Goulder, Head of the Data Assets and alpha Group here at J.P. Morgan. And today I'm really delighted to be joined by Mark Fleming-Williams, who is Head of Data Sourcing at CFM or Capital Fund Management, a Quant Hedge Fund, which is Paris headquartered, and also creator and host of the Alternative Data Podcast. So Mark, thanks so much for joining us here today.

Mark Fleming-Williams: Thank you very much for having me.

Eloise Goulder: And I'm excited to be talking with you today, Mark, on two levels, really. First of all, given your role at CFM, I'm really looking forward to hearing your lens on ingesting and trialing data from that seat. But also given your role as the creator and the host of the Alternative Data Podcast, a podcast which has released, I think, more than 140 episodes now I'm looking forward to hearing your perspectives on the landscape and the evolution of the old data landscape as a whole.

Mark Fleming-Williams: Well, I'm delighted to be here. Thank you very much for having me on the show.

Eloise Goulder: So Mark, could you start by introducing yourself and your background in your own words?

Mark Fleming-Williams: Of course. So as you say, I'm the Head of Data Sourcing for CFM. And how I got here, in a previous life, I was a geopolitics analyst for a company called Stratfor out of Texas. And I did have a major career pivot.  I knew that AI and data were coming and I didn't have enough exposure to it. So I went to do a Masters in big data in Spain and I came out and I knew I wanted to get into alternative data, but I didn't know anything about it and I didn't know anyone in it. And my method when I was a geopolitics analyst, my first method of learning about a subject was to listen to podcasts on it, because I think I'm quite an audio learner. And so I looked for a podcast on alternative data and one didn't exist. And so I thought, well, perhaps I should do that and created the Alternative Data Podcast as a method to meet everyone in the space and to learn more about it, but also to try to give something to the space in the hopes of receiving something back in the shape of a job. And that was a success because episode 10 was Exabel, which was a Norwegian alternative data technology platform. So I was working for them for a while while keeping the podcast going. And then I had CFM on the podcast a year and a half later. So I joined CFM as Head of Data Sourcing two and a half years ago. And oddly enough, podcast host and the Head of Data Sourcing have got an awful lot in common as jobs, because it's all the same people that you're talking to. And it's all the same network and it's some of the similar skills. So it's been a pretty seamless transition ever since I joined CFM.

Eloise Goulder: Well, I can imagine there are enormous synergies between those two roles, which is presumably why you were hired at CFM. Your role as head of data sourcing, what exactly does that entail?

Mark Fleming-Williams: We're a team of three. Our job is essentially to be the part where CFM touches the outside world with regards to data. Our job is to discover all the interesting data and then facilitate any discussions around and organisations around trialing that data and then handle any commercial negotiations as well. It's a job which is very into the alternative data community. When you go to an alternative data event, then you'll see a lot of the same people. And the nice thing for me is an awful lot of them have been on the podcast. And it's a funny thing. I don't know if you can confirm, but after having interviewed someone for 45 minutes, you do form a kind of connection, which lasts, actually.

Eloise Goulder: Absolutely!

Mark Fleming-Williams: But also from our perspective, there's a fair bit of creativity as well, because alternative data can be an awful lot of things that doesn't yet know it's alternative data. And so a lot of the job can be in trying to discover new sources of alternative data and get a head start essentially on the competition by trying to find things that weren't at events and which aren't already in the market. So that's the discovery part is part knowing people, but also part trying to uncover new sources of data. And then the trialing part there's a lot of maximising efficiencies, managing providers and also communication internally with the various stakeholders. We've always got a large number of trials ongoing and we also know what we're going to trial in three months time and six months time because we know that this data scientist and this quantitative research is going to work on it and they're going to work on it full time for three months. And at the end, then they're going to have an answer. And so that actually involves quite a lot of organisation and coordination and data providers aren't always perfect. And also, you know, there's legal considerations around getting contracts signed and things like that. So there can be delays. So trying to create processes that maximize efficiencies and keep everyone busy, essentially, and not have someone sitting on the sidelines with no trial to do. And then negotiation part is about looking the provider in the eye and trying to get the best deal for all concerned so that we're paying the right amount for a data set, which we will then take into production. We're extremely careful about this. From a quant fund perspective, we are quite lucky because we're buying something after we've tested it for three months. So I can go into a negotiation with a pretty good idea of what we want to pay for it to make the numbers work. So we've sort of de-risked the transaction, which you don't have when you're buying a car or something. You haven't really properly done a test drive for three months beforehand to know all the quirks. So that's the negotiation and whenever there's an issue with a provider, if the data hasn't arrived or something, we are kind of the first port of call that a data scientist might say, look, one of our alarms has gone off on this production feed. And so we need to speak to this provider straight away and we are there to manage that process to get this problem solved. So we are where CFM touches the outside world with regards to data.

Eloise Goulder: That is such a helpful overview of what the role entails as Head of Data Sourcing. This sequence of workflow involved in the process of ingesting data from the investigation to the trial, to the testing, to the legals, to the negotiation. And it is interesting to me that the alternative data landscape is a landscape where free trials are the norm so that the buyer really does have that clear understanding of how the data can be monetized and what the implied ROI would be on that data. Out of interest, are free trials really the norm? And how much variation do you see in the way data is made available by different players?

Mark Fleming-Williams: It's not always the norm. People start charging for trials. It is often a sign that they are not that exposed to the quantitative hedge fund space. Because I believe there are other buyers of alternative data, and perhaps they're non-finance and they might call it external data, for whom the historical data is the value. So from our perspective for a trial, we need to take the entire historical data set, but the historical data will allow us to train the computer, but it can only continue working if you continue to give us the live data, because then it can start making trading decisions for us. And so the historical data in itself contains no value for us, whereas perhaps for another type of buyer, it does. So it's an education process that we have to say to providers, it doesn't make any sense for us to pay for a trial and they say, oh, well, there's kind of logistics involved in providing you data. And we say, well, there's a PhD in physics who's spending three months looking at your data and also a very highly qualified data scientist. They are not cheap. And at the end of the three months, that could be time wasted very easily for us and it's huge opportunity costs that we could be testing something else as well. So in this trial, you and we are going in and taking risks together during this three month period. And if at the end of it, we have found the value, then we are delighted to send you money and you're delighted to send us the data. It’s de-risk the transaction. But honestly, I liken it to being charged to go into a shop to have a look at what they have. Actually, what you're doing is you're erecting barriers to us being able to buy your wares, which is not very good commercial sense. So it's a subject I'm passionate about. I've heard it outside CFM is that the renewal rate for quant funds is 95%. So we're a wonderful client because we renew. But the challenge is to get on board the train in the first place. Once you're on board, it's very hard to get off. But getting on is a pretty meticulous process, a three month trial where we're going to be really digging into the data, asking you a ton of questions and getting very comfortable with it, because we also say that the risk of taking on the wrong data set is much bigger than the cost of the data set. Because if we put that into production, it can destroy a lot more value than just the price of the data set. So we are incredibly cautious before we buy a data set.

Eloise Goulder: Well, I think your point about the sunk cost entailed in trialing from your side, even if there's no hard dollars paid to the old data provider is so critical, and all of this negotiation on boarding. It is a significant effort for hopefully a long term reward. And on the topic of trials, what are the attributes that you and your quant researchers really look for to determine whether or not a data set is valuable? Having as of data, this is a bugbear of ours. Are there other attributes that you really deem critical when you're assessing data for its validity?

Mark Fleming-Williams: Sure. There is a universe of data which we're interested in. And so the first job of my team is to see if the data provider we're talking to has data which is within that universe. And if not, then it's very easy to say, sorry, not for us. And so that is three things. One is a long history. I think the longer the history, the more confidence we can have in any signal that we find. Three to five years minimum, we would say for a history. And if we're talking macro, then it's more like five years plus. So a long history. So if we meet a provider who's only been around for a year, then we can't really do anything with you right now. Wide coverage. So we're not interested in a data set which will only tell you about Uber and Lyft, for example, although a discretionary investor might be, we are interested in something which can tell you about a large number of tickers or foreign exchange or commodity etc. We need to be able to tell a large story with a data set across the economy. And then the third thing, which is extremely important, is point in time. And point in time is the fact that the data has not been changed from the way it would have been on the day it was created. And so the revisions, if there have been revisions in the data, it's clear what the original data point was, as well as what the revision was. We need to be able to see the world as it was on that day. And so from our perspective, even if the data point was wrong, it's very important to have what it was on that day. So point in time is the third extremely important thing, these three points are saying, is it quant friendly or not? And if it's not quant friendly, then we're probably not interested. But then on top of that, we're looking for things which are an improvement on what we have already. You know we have a lot of data sets already in house. And so what's the next thing? And we're looking for something which is perhaps very different to what we already have. If there's an angle which we haven't already got covered in our data, then it will increase our diversification. But then also, is there something that's quicker than what we already have as well? So is it the same type of data set, but it's quicker or is it better in some other way to what we already have? Our first port of call is, is this something which is potentially of interest? And then it's a quantitative researchers who are saying, oh, this looks good, actually. Yes, I would like to know more and potentially trial it. We’re a waiter, essentially, we're presenting delicious potential food to these researchers and they can pick and choose what they might be interested in.

Eloise Goulder: Thanks for articulating all of those facets and these are all points that we hear very regularly on our side as well. These are essential attributes of a data set to make it quant ready. But then your point that, an established quant fund has already got so many different data sources running with predictive power. The bar is ultimately getting higher and higher, isn't it, for a new data set to add incremental value. We feel as data providers that there's an enormous appetite to be one of the first. to know what our data pipeline is and ideally to beta test a data set before it's even fully released, because there's this assumption that there might be some incremental orthogonal alpha there and funds want to get onto that quickly. How significant is being the first in your mind?

Mark Fleming-Williams: I think it's not as important as it used to be. My understanding is that in the early days of alternative data, you could have a data set which nobody else had. If you had the first credit card data set, then you were one of two or three funds that had it. Then you could have an awful lot of fun with it and it was a massive differentiator. Now with so much data in the world and so many people trying to buy the same data sets and crawling all over it, I think the differentiating factor is smaller. However, I think it is very important to strive to be the first for a number of reasons. You don't want to be the last. You want to be there for any new trends that are arriving. And I think the race seems to me to be a wider race, which is constantly ongoing. And I think there are more nuanced factors involved. But being at the front of the pack means being up to speed with the latest developments. So it's important to try to be unearthing new data sets which other people don't have. But the other thing to bear in mind with it is that if there's a data set which is not yet in the market, you've got to do an awful lot more work with it as well in order to make it more usable for you. And so there are costs involved in trying to be first, which aren't if you're finding a data set off the shelf, you know, they've got 20 quant fund clients already, then they've probably been whipped into shape by now. And it's more of a plug and play.

Eloise Goulder: Absolutely. Well, we see that as well. Yes you get the benefit as a beta tester, that you are among the first to see the signal. But on the other hand, you need to accept that there may be issues. And we as a data provider expect more feedback from the beta testers. And can you describe the evolution of the Altdata landscape from your lens?

Mark Fleming-Williams: So I've got into the space since 2020, but with my podcast, I've seen myself as a little bit of a amateur alternative data historian. So I've tried to get the people who are influential in important times. I’ve had Gene Exter, who was the man who might have coined the phrase alternative data in a New York Times article in 2015 or 2016. And I haven't yet had Tony Berkman, who's probably the godfather of alternative data because he founded Majestic Research back in 2001. So how it's developed so Majestic Research, get their hands on Yodlee's credit card data set. And then a hedge fund gets their hands on it as well. Between these three companies, I see so many of the major influential alternative data individuals who are around today came from that triangle. This birth moment, it feels to me of credit card data really becoming influential and everyone beginning to see, wow, this is a thing, because alternative data as a concept, I've seen it traced back to the Babylonians or the Venetians, you know, seeing what flags were on ships to see what was coming into port that day. Or through the noughties, people used to be sending out people to look at crops, before there was this kind of big data surge. But the big data revolution really enabled finance to start actually getting ahead of official announcements and getting ahead of official releases. So 2010, 11, 12 seems to me like a real beginning boom time. And then it just kind of booms. So from 15 to 19 is probably what you'd call the glory days, the golden age, maybe of alternative data. And I just missed it, sadly. But you know, the highest compliment that can be given to an alternative data event today. People like, it's just like 2019, because it was so exciting, you know, and there's so much novelty and new types of providers were turning up. And so it must have been a time of real feelings of growth and excitement in the space. And then COVID happens, people really needed to know what lockdown looked like before the quarterly numbers came out. And so using credit card data and location data suddenly, I think became very important. And so that was a bit of a coming of age potentially for alternative data. My feeling is, and it's been said to me as well, that innovation perhaps dried up a little bit in the couple of years after that, 22, 23. The types of data was quite established and there was more about reaching new efficiencies within the types of data and can we deliver it a bit faster. Also it perhaps to an extent it sort of reached a type of saturation within the hedge fund industry as well, who are the primary consumers of alternative data, because it's a big lift actually to play this game. You need to have a large team of data scientists and quantitative researchers to really take it seriously. And so there is a finite number of players who can buy an ultimately finite number of data sets. But then I would say more recently, two things have come about. One is AI, which is obviously changing everything in ways that is very hard to measure and impossible to predict. So that is changing potentially the way data sets can be created and also the way buyers can interact with them as well. And so the whole dynamic of both sides is changing. For one thing, it's made textual data much more valuable and interesting straight away.  And then there is a suggestion that the corporate sector might be finally waking up to the opportunities in this data and actually understanding what their competitors are doing, using interesting footfall data to work out where they're going to put their next store, what's their supply chain doing right now, you know, properly using these types of data to make business decisions. And if the corporate sector really jumps on this, then possibly it changes the dynamics for the providers, because it just opens up a gigantic quantity of new buyers. And there has historically been a reputational risk involved in selling your data. And then there has been a reward. But if the potential reward grows that much, and then it gets more normalized, then that decreases the risk and increases the reward, which then increases the amount of potential corporates who might be selling their data as well. And so it could lead to a great explosion in alternative data.

Eloise Goulder: Absolutely. Well I remember tracking the number of data sets and the types of data sets that were available via all the large platforms from about 2015 onwards. And it was phenomenal the extent to which the number of data sets available increased month on month and year on year over that period. But coming to the future, unsurprising that you mention AI, that you mention LLMs and the value of textual data increasing and fascinating that you should mention that the way in which quant researchers analyse not only text, but also unstructured data is changing. You mentioned that it's impossible to forecast what happens with AI, but what is your best guess for what's coming in the next few years?

Mark Fleming-Williams: Yeah, an example of an interesting opportunity that it creates is that it's so much easier to create a scraper today than it was three, five years ago. And so that could mean that we suddenly get a gigantic army of scrapers who are all scraping everything, which for one thing might cause logistical problems with everyone trying to ping the same websites at the same time. That could cause problems. But I mean, from our perspective, it's got two sides to it. Because we are also, as I've already said, we're three to five years minimum. So if you all start creating your scraper today, it's not gonna be that interesting to us for three to five years. So we are still going to remain focused on that long history established kind of data set. And it's very hard for us to get excited about the zeitgeist actually from a quant fund perspective.

Eloise Goulder: Well, Mark, I think we've covered such a lot from all of those stages of the data ingestion process from sourcing the data to onboarding it to trialling it and testing it and researching it to negotiation part and the legals part. Such a significant process, which is really at the heart of what you do. I also think it's fascinating that while in a sense this is a very process driven, data driven space, it's still also a very human space where relationships are so important. I mean, as you say, you've developed this network of relationships through your job, through your podcast. And relationships are still so important when it comes to understanding which data sets are being worked on, what's the pipeline and how the landscape really is evolving.  And I really hope for all of our sakes that we are going through a new rebirth in the industry as a result of LLMs and the ability to analyze unstructured data and the ability for more players like the corporates to come in and potentially contribute more data. I think this will make for such a rich and exciting and challenging time in the future. So thank you very much, Mark, for taking all of this time.

Mark Fleming-Williams: Thank you very much, Eloise. It's been a great pleasure.

Eloise Goulder: Thanks also to our listeners for tuning into this bi-weekly podcast from our group. If you'd like to learn more about CFM or indeed Mark's podcast, the Alternative Data Podcast, then please do look at the links in our show notes. Otherwise, if you'd like to be in touch with our team, then do go to our website at jpmorgan.com/market-data-intelligence, where you can always contact us via the form. And with that, we'll close. Thank you.

Voiceover: Thanks for listening to Market Matters. If you’ve enjoyed this conversation, we hope you’ll review, rate and subscribe to J.P. Morgan’s Making Sense to stay on top of the latest industry news and trends, available on Apple Podcasts, Spotify, and YouTube. The views expressed in this podcast may not necessarily reflect the views of J.P. Morgan Chase & Co and its affiliates (together “J.P. Morgan”), they are not the product of J.P. Morgan’s Research Department and do not constitute a recommendation, advice, or an offer or a solicitation to buy or sell any security or financial instrument.  This podcast is intended for institutional and professional investors only and is not intended for retail investor use, it is provided for information purposes only. Referenced products and services in this podcast may not be suitable for you and may not be available in all jurisdictions.  J.P. Morgan may make markets and trade as principal in securities and other asset classes and financial products that may have been discussed.  For additional disclaimers and regulatory disclosures, please visit: www.jpmorgan.com/disclosures/salesandtradingdisclaimer. For the avoidance of doubt, opinions expressed by any external speakers are the personal views of those speakers and do not represent the views of J.P. Morgan. © 2025 JPMorgan Chase & Company. All rights reserved.

[End of episode]

Trading Insights: All about alternative data

[Music]

Mark Fleming-Williams: There is a universe of data which we're interested in. And so that is three things. One is a long history. I think the longer the history, the more confidence we can have in any signal that we find. Wide coverage. We are interested in something which can tell you about a large number of tickers. And then the third thing, which is extremely important, is point in time. And so from our perspective, even if the data point was wrong, it's very important to have what it was on that day.

Eloise Goulder: Hi, I'm Eloise Goulder, Head of the Data Assets and alpha Group here at J.P. Morgan. And today I'm really delighted to be joined by Mark Fleming-Williams, who is Head of Data Sourcing at CFM or Capital Fund Management, a Quant Hedge Fund, which is Paris headquartered, and also creator and host of the Alternative Data Podcast. So Mark, thanks so much for joining us here today.

Mark Fleming-Williams: Thank you very much for having me.

Eloise Goulder: And I'm excited to be talking with you today, Mark, on two levels, really. First of all, given your role at CFM, I'm really looking forward to hearing your lens on ingesting and trialing data from that seat. But also given your role as the creator and the host of the Alternative Data Podcast, a podcast which has released, I think, more than 140 episodes now I'm looking forward to hearing your perspectives on the landscape and the evolution of the old data landscape as a whole.

Mark Fleming-Williams: Well, I'm delighted to be here. Thank you very much for having me on the show.

Eloise Goulder: So Mark, could you start by introducing yourself and your background in your own words?

Mark Fleming-Williams: Of course. So as you say, I'm the Head of Data Sourcing for CFM. And how I got here, in a previous life, I was a geopolitics analyst for a company called Stratfor out of Texas. And I did have a major career pivot.  I knew that AI and data were coming and I didn't have enough exposure to it. So I went to do a Masters in big data in Spain and I came out and I knew I wanted to get into alternative data, but I didn't know anything about it and I didn't know anyone in it. And my method when I was a geopolitics analyst, my first method of learning about a subject was to listen to podcasts on it, because I think I'm quite an audio learner. And so I looked for a podcast on alternative data and one didn't exist. And so I thought, well, perhaps I should do that and created the Alternative Data Podcast as a method to meet everyone in the space and to learn more about it, but also to try to give something to the space in the hopes of receiving something back in the shape of a job. And that was a success because episode 10 was Exabel, which was a Norwegian alternative data technology platform. So I was working for them for a while while keeping the podcast going. And then I had CFM on the podcast a year and a half later. So I joined CFM as Head of Data Sourcing two and a half years ago. And oddly enough, podcast host and the Head of Data Sourcing have got an awful lot in common as jobs, because it's all the same people that you're talking to. And it's all the same network and it's some of the similar skills. So it's been a pretty seamless transition ever since I joined CFM.

Eloise Goulder: Well, I can imagine there are enormous synergies between those two roles, which is presumably why you were hired at CFM. Your role as head of data sourcing, what exactly does that entail?

Mark Fleming-Williams: We're a team of three. Our job is essentially to be the part where CFM touches the outside world with regards to data. Our job is to discover all the interesting data and then facilitate any discussions around and organisations around trialing that data and then handle any commercial negotiations as well. It's a job which is very into the alternative data community. When you go to an alternative data event, then you'll see a lot of the same people. And the nice thing for me is an awful lot of them have been on the podcast. And it's a funny thing. I don't know if you can confirm, but after having interviewed someone for 45 minutes, you do form a kind of connection, which lasts, actually.

Eloise Goulder: Absolutely!

Mark Fleming-Williams: But also from our perspective, there's a fair bit of creativity as well, because alternative data can be an awful lot of things that doesn't yet know it's alternative data. And so a lot of the job can be in trying to discover new sources of alternative data and get a head start essentially on the competition by trying to find things that weren't at events and which aren't already in the market. So that's the discovery part is part knowing people, but also part trying to uncover new sources of data. And then the trialing part there's a lot of maximising efficiencies, managing providers and also communication internally with the various stakeholders. We've always got a large number of trials ongoing and we also know what we're going to trial in three months time and six months time because we know that this data scientist and this quantitative research is going to work on it and they're going to work on it full time for three months. And at the end, then they're going to have an answer. And so that actually involves quite a lot of organisation and coordination and data providers aren't always perfect. And also, you know, there's legal considerations around getting contracts signed and things like that. So there can be delays. So trying to create processes that maximize efficiencies and keep everyone busy, essentially, and not have someone sitting on the sidelines with no trial to do. And then negotiation part is about looking the provider in the eye and trying to get the best deal for all concerned so that we're paying the right amount for a data set, which we will then take into production. We're extremely careful about this. From a quant fund perspective, we are quite lucky because we're buying something after we've tested it for three months. So I can go into a negotiation with a pretty good idea of what we want to pay for it to make the numbers work. So we've sort of de-risked the transaction, which you don't have when you're buying a car or something. You haven't really properly done a test drive for three months beforehand to know all the quirks. So that's the negotiation and whenever there's an issue with a provider, if the data hasn't arrived or something, we are kind of the first port of call that a data scientist might say, look, one of our alarms has gone off on this production feed. And so we need to speak to this provider straight away and we are there to manage that process to get this problem solved. So we are where CFM touches the outside world with regards to data.

Eloise Goulder: That is such a helpful overview of what the role entails as Head of Data Sourcing. This sequence of workflow involved in the process of ingesting data from the investigation to the trial, to the testing, to the legals, to the negotiation. And it is interesting to me that the alternative data landscape is a landscape where free trials are the norm so that the buyer really does have that clear understanding of how the data can be monetized and what the implied ROI would be on that data. Out of interest, are free trials really the norm? And how much variation do you see in the way data is made available by different players?

Mark Fleming-Williams: It's not always the norm. People start charging for trials. It is often a sign that they are not that exposed to the quantitative hedge fund space. Because I believe there are other buyers of alternative data, and perhaps they're non-finance and they might call it external data, for whom the historical data is the value. So from our perspective for a trial, we need to take the entire historical data set, but the historical data will allow us to train the computer, but it can only continue working if you continue to give us the live data, because then it can start making trading decisions for us. And so the historical data in itself contains no value for us, whereas perhaps for another type of buyer, it does. So it's an education process that we have to say to providers, it doesn't make any sense for us to pay for a trial and they say, oh, well, there's kind of logistics involved in providing you data. And we say, well, there's a PhD in physics who's spending three months looking at your data and also a very highly qualified data scientist. They are not cheap. And at the end of the three months, that could be time wasted very easily for us and it's huge opportunity costs that we could be testing something else as well. So in this trial, you and we are going in and taking risks together during this three month period. And if at the end of it, we have found the value, then we are delighted to send you money and you're delighted to send us the data. It’s de-risk the transaction. But honestly, I liken it to being charged to go into a shop to have a look at what they have. Actually, what you're doing is you're erecting barriers to us being able to buy your wares, which is not very good commercial sense. So it's a subject I'm passionate about. I've heard it outside CFM is that the renewal rate for quant funds is 95%. So we're a wonderful client because we renew. But the challenge is to get on board the train in the first place. Once you're on board, it's very hard to get off. But getting on is a pretty meticulous process, a three month trial where we're going to be really digging into the data, asking you a ton of questions and getting very comfortable with it, because we also say that the risk of taking on the wrong data set is much bigger than the cost of the data set. Because if we put that into production, it can destroy a lot more value than just the price of the data set. So we are incredibly cautious before we buy a data set.

Eloise Goulder: Well, I think your point about the sunk cost entailed in trialing from your side, even if there's no hard dollars paid to the old data provider is so critical, and all of this negotiation on boarding. It is a significant effort for hopefully a long term reward. And on the topic of trials, what are the attributes that you and your quant researchers really look for to determine whether or not a data set is valuable? Having as of data, this is a bugbear of ours. Are there other attributes that you really deem critical when you're assessing data for its validity?

Mark Fleming-Williams: Sure. There is a universe of data which we're interested in. And so the first job of my team is to see if the data provider we're talking to has data which is within that universe. And if not, then it's very easy to say, sorry, not for us. And so that is three things. One is a long history. I think the longer the history, the more confidence we can have in any signal that we find. Three to five years minimum, we would say for a history. And if we're talking macro, then it's more like five years plus. So a long history. So if we meet a provider who's only been around for a year, then we can't really do anything with you right now. Wide coverage. So we're not interested in a data set which will only tell you about Uber and Lyft, for example, although a discretionary investor might be, we are interested in something which can tell you about a large number of tickers or foreign exchange or commodity etc. We need to be able to tell a large story with a data set across the economy. And then the third thing, which is extremely important, is point in time. And point in time is the fact that the data has not been changed from the way it would have been on the day it was created. And so the revisions, if there have been revisions in the data, it's clear what the original data point was, as well as what the revision was. We need to be able to see the world as it was on that day. And so from our perspective, even if the data point was wrong, it's very important to have what it was on that day. So point in time is the third extremely important thing, these three points are saying, is it quant friendly or not? And if it's not quant friendly, then we're probably not interested. But then on top of that, we're looking for things which are an improvement on what we have already. You know we have a lot of data sets already in house. And so what's the next thing? And we're looking for something which is perhaps very different to what we already have. If there's an angle which we haven't already got covered in our data, then it will increase our diversification. But then also, is there something that's quicker than what we already have as well? So is it the same type of data set, but it's quicker or is it better in some other way to what we already have? Our first port of call is, is this something which is potentially of interest? And then it's a quantitative researchers who are saying, oh, this looks good, actually. Yes, I would like to know more and potentially trial it. We’re a waiter, essentially, we're presenting delicious potential food to these researchers and they can pick and choose what they might be interested in.

Eloise Goulder: Thanks for articulating all of those facets and these are all points that we hear very regularly on our side as well. These are essential attributes of a data set to make it quant ready. But then your point that, an established quant fund has already got so many different data sources running with predictive power. The bar is ultimately getting higher and higher, isn't it, for a new data set to add incremental value. We feel as data providers that there's an enormous appetite to be one of the first. to know what our data pipeline is and ideally to beta test a data set before it's even fully released, because there's this assumption that there might be some incremental orthogonal alpha there and funds want to get onto that quickly. How significant is being the first in your mind?

Mark Fleming-Williams: I think it's not as important as it used to be. My understanding is that in the early days of alternative data, you could have a data set which nobody else had. If you had the first credit card data set, then you were one of two or three funds that had it. Then you could have an awful lot of fun with it and it was a massive differentiator. Now with so much data in the world and so many people trying to buy the same data sets and crawling all over it, I think the differentiating factor is smaller. However, I think it is very important to strive to be the first for a number of reasons. You don't want to be the last. You want to be there for any new trends that are arriving. And I think the race seems to me to be a wider race, which is constantly ongoing. And I think there are more nuanced factors involved. But being at the front of the pack means being up to speed with the latest developments. So it's important to try to be unearthing new data sets which other people don't have. But the other thing to bear in mind with it is that if there's a data set which is not yet in the market, you've got to do an awful lot more work with it as well in order to make it more usable for you. And so there are costs involved in trying to be first, which aren't if you're finding a data set off the shelf, you know, they've got 20 quant fund clients already, then they've probably been whipped into shape by now. And it's more of a plug and play.

Eloise Goulder: Absolutely. Well, we see that as well. Yes you get the benefit as a beta tester, that you are among the first to see the signal. But on the other hand, you need to accept that there may be issues. And we as a data provider expect more feedback from the beta testers. And can you describe the evolution of the Altdata landscape from your lens?

Mark Fleming-Williams: So I've got into the space since 2020, but with my podcast, I've seen myself as a little bit of a amateur alternative data historian. So I've tried to get the people who are influential in important times. I’ve had Gene Exter, who was the man who might have coined the phrase alternative data in a New York Times article in 2015 or 2016. And I haven't yet had Tony Berkman, who's probably the godfather of alternative data because he founded Majestic Research back in 2001. So how it's developed so Majestic Research, get their hands on Yodlee's credit card data set. And then a hedge fund gets their hands on it as well. Between these three companies, I see so many of the major influential alternative data individuals who are around today came from that triangle. This birth moment, it feels to me of credit card data really becoming influential and everyone beginning to see, wow, this is a thing, because alternative data as a concept, I've seen it traced back to the Babylonians or the Venetians, you know, seeing what flags were on ships to see what was coming into port that day. Or through the noughties, people used to be sending out people to look at crops, before there was this kind of big data surge. But the big data revolution really enabled finance to start actually getting ahead of official announcements and getting ahead of official releases. So 2010, 11, 12 seems to me like a real beginning boom time. And then it just kind of booms. So from 15 to 19 is probably what you'd call the glory days, the golden age, maybe of alternative data. And I just missed it, sadly. But you know, the highest compliment that can be given to an alternative data event today. People like, it's just like 2019, because it was so exciting, you know, and there's so much novelty and new types of providers were turning up. And so it must have been a time of real feelings of growth and excitement in the space. And then COVID happens, people really needed to know what lockdown looked like before the quarterly numbers came out. And so using credit card data and location data suddenly, I think became very important. And so that was a bit of a coming of age potentially for alternative data. My feeling is, and it's been said to me as well, that innovation perhaps dried up a little bit in the couple of years after that, 22, 23. The types of data was quite established and there was more about reaching new efficiencies within the types of data and can we deliver it a bit faster. Also it perhaps to an extent it sort of reached a type of saturation within the hedge fund industry as well, who are the primary consumers of alternative data, because it's a big lift actually to play this game. You need to have a large team of data scientists and quantitative researchers to really take it seriously. And so there is a finite number of players who can buy an ultimately finite number of data sets. But then I would say more recently, two things have come about. One is AI, which is obviously changing everything in ways that is very hard to measure and impossible to predict. So that is changing potentially the way data sets can be created and also the way buyers can interact with them as well. And so the whole dynamic of both sides is changing. For one thing, it's made textual data much more valuable and interesting straight away.  And then there is a suggestion that the corporate sector might be finally waking up to the opportunities in this data and actually understanding what their competitors are doing, using interesting footfall data to work out where they're going to put their next store, what's their supply chain doing right now, you know, properly using these types of data to make business decisions. And if the corporate sector really jumps on this, then possibly it changes the dynamics for the providers, because it just opens up a gigantic quantity of new buyers. And there has historically been a reputational risk involved in selling your data. And then there has been a reward. But if the potential reward grows that much, and then it gets more normalized, then that decreases the risk and increases the reward, which then increases the amount of potential corporates who might be selling their data as well. And so it could lead to a great explosion in alternative data.

Eloise Goulder: Absolutely. Well I remember tracking the number of data sets and the types of data sets that were available via all the large platforms from about 2015 onwards. And it was phenomenal the extent to which the number of data sets available increased month on month and year on year over that period. But coming to the future, unsurprising that you mention AI, that you mention LLMs and the value of textual data increasing and fascinating that you should mention that the way in which quant researchers analyse not only text, but also unstructured data is changing. You mentioned that it's impossible to forecast what happens with AI, but what is your best guess for what's coming in the next few years?

Mark Fleming-Williams: Yeah, an example of an interesting opportunity that it creates is that it's so much easier to create a scraper today than it was three, five years ago. And so that could mean that we suddenly get a gigantic army of scrapers who are all scraping everything, which for one thing might cause logistical problems with everyone trying to ping the same websites at the same time. That could cause problems. But I mean, from our perspective, it's got two sides to it. Because we are also, as I've already said, we're three to five years minimum. So if you all start creating your scraper today, it's not gonna be that interesting to us for three to five years. So we are still going to remain focused on that long history established kind of data set. And it's very hard for us to get excited about the zeitgeist actually from a quant fund perspective.

Eloise Goulder: Well, Mark, I think we've covered such a lot from all of those stages of the data ingestion process from sourcing the data to onboarding it to trialling it and testing it and researching it to negotiation part and the legals part. Such a significant process, which is really at the heart of what you do. I also think it's fascinating that while in a sense this is a very process driven, data driven space, it's still also a very human space where relationships are so important. I mean, as you say, you've developed this network of relationships through your job, through your podcast. And relationships are still so important when it comes to understanding which data sets are being worked on, what's the pipeline and how the landscape really is evolving.  And I really hope for all of our sakes that we are going through a new rebirth in the industry as a result of LLMs and the ability to analyze unstructured data and the ability for more players like the corporates to come in and potentially contribute more data. I think this will make for such a rich and exciting and challenging time in the future. So thank you very much, Mark, for taking all of this time.

Mark Fleming-Williams: Thank you very much, Eloise. It's been a great pleasure.

Eloise Goulder: Thanks also to our listeners for tuning into this bi-weekly podcast from our group. If you'd like to learn more about CFM or indeed Mark's podcast, the Alternative Data Podcast, then please do look at the links in our show notes. Otherwise, if you'd like to be in touch with our team, then do go to our website at jpmorgan.com/market-data-intelligence, where you can always contact us via the form. And with that, we'll close. Thank you.

Voiceover: Thanks for listening to Market Matters. If you’ve enjoyed this conversation, we hope you’ll review, rate and subscribe to J.P. Morgan’s Making Sense to stay on top of the latest industry news and trends, available on Apple Podcasts, Spotify, and YouTube. The views expressed in this podcast may not necessarily reflect the views of J.P. Morgan Chase & Co and its affiliates (together “J.P. Morgan”), they are not the product of J.P. Morgan’s Research Department and do not constitute a recommendation, advice, or an offer or a solicitation to buy or sell any security or financial instrument.  This podcast is intended for institutional and professional investors only and is not intended for retail investor use, it is provided for information purposes only. Referenced products and services in this podcast may not be suitable for you and may not be available in all jurisdictions.  J.P. Morgan may make markets and trade as principal in securities and other asset classes and financial products that may have been discussed.  For additional disclaimers and regulatory disclosures, please visit: www.jpmorgan.com/disclosures/salesandtradingdisclaimer. For the avoidance of doubt, opinions expressed by any external speakers are the personal views of those speakers and do not represent the views of J.P. Morgan. © 2025 JPMorgan Chase & Company. All rights reserved.

[End of episode]

Analyzing alternative data can be complex and challenging – but it can also be highly rewarding. In this episode, Mark Fleming-Williams, head of Data Sourcing at CFM and creator of “The Alternative Data Podcast,” speaks with Eloise Goulder, head of the Data Assets & Alpha Group at J.P. Morgan. They discuss the value of alt data, how the viability of a data set is assessed and what AI and LLMs mean for the future of the industry.

Learn more about the Data Assets & Alpha Group

This episode was recorded on April 29, 2025.

More from Market Matters


Explore the latest insights on navigating today's complex markets.

EXPLORE EPISODES

More from Making Sense


Market Matters is part of the Making Sense podcast, which delivers insights across Investment Banking, Markets and Research. In each conversation, the firm’s leaders dive into the latest market moves and key developments that impact our complex global economy.

Listen Now

The views expressed in this podcast may not necessarily reflect the views of J.P. Morgan Chase & Co and its affiliates (together “J.P. Morgan”), they are not the product of J.P. Morgan’s Research Department and do not constitute a recommendation, advice, or an offer or a solicitation to buy or sell any security or financial instrument.  This podcast is intended for institutional and professional investors only and is not intended for retail investor use, it is provided for information purposes only. Referenced products and services in this podcast may not be suitable for you and may not be available in all jurisdictions.  J.P. Morgan may make markets and trade as principal in securities and other asset classes and financial products that may have been discussed.  For additional disclaimers and regulatory disclosures, please visit: www.jpmorgan.com/disclosures/salesandtradingdisclaimer. For the avoidance of doubt, opinions expressed by any external speakers are the personal views of those speakers and do not represent the views of J.P. Morgan.

© 2025 JPMorgan Chase & Company. All rights reserved.