Carl Ferrer - #TrueDataOps Podcast Ep. 43

[00:00:03] Speaker A: Hey, welcome to the True DataOps podcast. We'll get started in a few more seconds to allow people to get logged on to the live stream. See you in a few. Welcome to this episode of our show True Data Ops. I'm your host, Kent Graziano, the data Warrior. In each episode we try to bring you a podcast discussing the world of Data Ops and the people that are making Data Ops what it is today. So be sure to look up and subscribe to the DataOps Live YouTube channel where you can find the recordings for all of our past episodes. If you've missed any of the prior episodes, now is a great chance to catch up. Better yet, go to truedataops.org and subscribe for this podcast. My guest today is an experienced solution architect, CTO and guitarist who is working with the recently acquired Forwardview, Mr. Carl Farrar. Welcome to the show, Carl. [00:01:10] Speaker B: Hey Kent, how are you? [00:01:12] Speaker A: Very good. So for folks who aren't familiar with your background, why don't you give us a little bit about your background and your career and how you got to where you are today? [00:01:23] Speaker B: Yeah, sure. So I took a kind of a unusual path into computing in the fact that some of you will probably notice the guitars in the background that started off kind of working in music and in recording studios. And of course in kind of the late 90s, it was the, you know, kind of the real push of computers and kind of aiding in recording things like Pro Tools coming in kind of the 2000s. And that kind of got me interested in. Well, it made me realize I wasn't a good enough musician and realized that there was other things there that could kind of, that I could kind of put my heart to. And so I kind of started working with computers in recording studios and that kind of then morphed into working more with data and kind of progressed working for multiple organizations in kind of data related roles and then moving into some more specific kind of areas of technology, but also specific areas of kind of customers and kind of focusing primarily on financial services. And that kind of brought me towards forward view. So I think I was employee number two, employee number three, I think. And that took a lot of the experience of working within data governance. So looking at data catalogs, AI and ML catalogs, which kind of back in 2016, 2017 was quite new and kind of interesting back then, before all the fatigue set in, spent a lot of time working for a company called delphix, which is, you know, kind of data virtualization, but data protection and again building on that, governance. [00:03:14] Speaker A: Yeah, Knew quite A few people there at Delphix. [00:03:17] Speaker B: Yeah, yeah. You know what? Delphix been really interesting little place because it's, it had such a kind of a hotbed of talent from all sorts of different areas. And I think that's one of the things that I kind of really like about it is the fact you start seeing all these people come from all different backgrounds and they all kind of come together to focus on whether it be data or technology and that kind of stuff. And Delphix is one of those great places. 2014, 2015, when I was there, it was really the start of that data virtualization breakout, taking the VMware side of infrastructure and pointing it towards data, which I think is kind of a fantastic movement. So, yeah, so now CTO at Forward View, you know, kind of the primary focus there is bringing, you know, the quarter of a century worth of gray hairs in kind of data to financial services and kind of helping those organizations kind of grow and build out. Very data centric. [00:04:32] Speaker A: Yeah. So in this season of the show, as we've discussed, we're kind of taking a step back and look, thinking about the world of true data ops and how it's evolved over the last couple of years. Now, obviously you've been a practitioner in the field going back quite a ways, as you said, helping customers, trying to take advantage of the data, and of course, more recently, obviously the cloud. So can you give me a little idea of how you've seen this space evolve through the, through these years and especially last couple of years, and really some of the focus you now have there at Forward View? [00:05:09] Speaker B: Yeah, man. I think that, I think one of the things that's kind of inherent within technology that kind of is underlying everywhere is that people focus so much on the technology, kind of the flashy lights and the wizard bits and the things that shiny and attractive. And I think where we've kind of seen kind of the data ops movement is that people are starting to realize that you can have all this really shiny stuff that looks great, but, you know, you can't do anything with it unless you start to put this kind of framework around it. So I think, you know, you initially look back in kind of the. [00:05:53] Speaker A: The. [00:05:53] Speaker B: Early 2000 and tens, and you start looking at things like, again, like VMware, where this capability being able to kind of spin up all this stuff all over the place and be super powerful, but not have the necessary controls around it. And I think that is where I'm really seeing the way where DataOps is actually helping. And it's been able to take all these fantastic WYSI things and actually put a framework around them and allow people to make much better use and leverage these things. And I think we're now coming to a point where, with the seven pillars of DataOps is a really good example where you starting to get mainstream realization of Data ops. Where you mentioned it a few years ago, and they'd be like, you meant DevOps, didn't you? [00:06:46] Speaker A: And it's like, yeah, yeah, exactly. [00:06:50] Speaker B: So I think seeing that kind of evolve has been really powerful and I think kind of again, splitting that out, making it more instead of it just being like, oh, you thought it was DevOps to being there's DevOps and there's data Ops and kind of using that as a conduit and an enabler, rather than it just being like a buzzword, I think is kind of another way to put it. [00:07:14] Speaker A: Yeah, I think initially, you know, people started off it was, you know, well, what's Data Ops? It's DevOps for data. Okay. Yeah, you know, it kind of get. Gets there. And that's why, you know, Justin and Guy and I got together and tried to craft these, these seven pillars. Thinking about, it's like, well, what are we really trying to do and how can we express it to people in a way that's going to make sense so that they realize that it's not just DevOps, right. There's more because we're dealing with data. And as you know, it's like data is a different beast than code. [00:07:51] Speaker B: Yeah, no, absolutely. And I think that's one of the things that we, you know, we've kind of seen, you know, with the rise of DevOps, you starting to see the kind of the bugs in code decrease and it actually brings more focus onto the actual data. And it's something that, you know, one of the phrases that we've coined at forward view for quite a while is we call it like a data quality hockey stick, which is where you start seeing these organizations where they really start pushing kind of the Spotify models and all that kind of stuff. And they get like the, the devs are all kind of in a good place where they've got all of the enablement of technologies that allow them to create branches and do all this cool stuff, and they go from releasing once a quarter to once a week or once every couple of days, and then all of a sudden they start hitting this data quality hockey stick, whereby it then becomes the data that's the problem. So not just the quality, but even accessing it, kind of getting it quick enough. And I think we're seeing more and more organizations that are focused on getting that data. The competitive difference is how quickly they can get that data and work it. And this is one of the cool things that I like about DataOps is the fact that it's really starting to be effective now in helping those organizations have the same spin that they have with DevOps, which has been really successful and putting it onto data as well. [00:09:30] Speaker A: Yeah, because I think some of it, as we've got the volume of data, of course gone through the rough the clouds, enabled us to do things that Hadoop dreamed of, but then trying to process it faster. Like you said, the competitive advantage is in can you turn that data into information, to useful business centric information fast enough. So that means pipelines, right? It's like, so how, how fast can you build a pipeline? But then as that speed increases, the occurrence of bugs can increase as well if you don't have some controls around it, some governance and like you said, the really kind of data quality testing even. [00:10:21] Speaker B: And I think that's the kind of the million dollar question for a lot of these financial service organizations that we kind of focus so much on, which is where are all the bugs? Is it the dev, is it the data? And how do you get to that quicker? And I think we regularly say, I mean it's just human nature, like path of least resistance where you see people and you turn around and go, oh, it's a problem with the code. And the devs go, not a problem with our code. Our code's great, thanks, tested, it works great. And then you get the people with the data. It's like, well, yeah, I know my data. What are you trying to say? My data is wrong? Of course my data's not wrong. And you spend so long in that little life cycle until somebody is actually willing to just stand up and go, actually yeah, maybe it is a challenge with this. And it's like, how do you choose what to look at first? And there isn't really a right or wrong answer. It's based on experience. You know, kind of people understand that, you know, the code people understand the data and being able to kind of bring that together and you know, kind of work out where to start first that's happening. I mean, to be kind of slightly topical, you know, if you look at, you know, the challenges that the conflict in Ukraine, Ukraine brought to things like risk models, I mean you looked at, you know, large banks that instead of doing risk models, you know, maybe Two, three times a day. You know, there'd be something that be mentioned on the news or some new report and it would, then they'd be doing the risk models in parallel and they'd be doing risk models, you know, instead of doing maybe, I don't know, five or six a day, they were doing like 100. So it's, you start looking at that and seeing if you then have, you know, is it the right data? You know, kind of all of these things that are kind of really fundamental. You know, your competitive advantage could be one bank changes its risk model that allows certain, you know, kind of process to go ahead and dozens and the difference can be catastrophic. [00:12:33] Speaker A: Yeah, that kind of falls into the pillar on automated regression testing and monitoring which most people know. We talk about observability but accelerating at what you're saying really to the point of it's not just, okay, we're doing a two week sprint and we'll run all these regression tests at the end of the sprint and make sure everything's good to go to production. But now you're seeing, you know, having to do it not only daily but many times a day because things are changing and moving so fast. [00:13:09] Speaker B: Yeah. And I think the, one of the things that I'm kind of, when I, when I first came across the kind of the seven pillars that monitoring was in there because again it's, there's so many organizations, they, the automation is there and everything is all fantastic and they put loads of effort into it, but then they don't. Things change things, you know, both with the data and with the code and with all sorts of other things around it and things are going to go wrong. So the quicker you can understand that coming back to the point earlier, which is, you know, even deciding which one, whether it's a bug or whether it's, you know, kind of data is fundamental. And so many organizations just don't put the focus on the observability side of things because they just assume that, you know, the Spotify model will sort it or you know, they, they bought Snowflake so Snowflake will sort it because it stays related, you know, and it's like, and that, that's, that was kind of my point earlier about saying, you know, that so many people will focus on the, on the whizzy stuff and the flashing lights rather than actually taking a step back and thinking, have I what if it, what if this happens? What if that happens and how would I react to that or how my teams react to that? And that's one of the things that I kind of like about this focus on data. Not just, I mean I would class dev being slightly more technical and more technology data being more fundamental. [00:14:37] Speaker A: So yeah, definitely, it's definitely, you know with things changing it was hard enough in the early days of data warehousing to get people to test their ETL code even it's like okay, yeah the data moved from the source to the target. Woohoo, we're done. We have a star schema, there's a bunch of data in there. But thinking through well what is that data really supposed to say? What should it look like? What's the valid domains for? Basically every field, not just one or two fields but you know, what are the business keys look like, what are the customer names look like, what type of sales numbers should we expect? Is there a boundary there somewhere? Because otherwise you don't know. This is where it gets into the data, right? It's not just testing the code, it's like the code's running great but what's coming out the other side? Is it within the range of expectations that yeah, this is normal. If you're, you're a small medium sized business and your revenue annually is somewhere 2, 3 million dollars and you get one transaction comes through that's 1.5 million, well that's probably wrong but if you don't catch it then it gets all the way to the bi layer and somebody's running a report and say wait, how's our revenue? 7 and a half million, right. It's double what it was last year and it just changed this quarter. So that was the, one of the challenges I think that's where the observability and the monitoring of the data itself comes in. Right? [00:16:16] Speaker B: Yeah, I think you start looking at that and as you put it like there's a transaction that's out by a million. When you start looking at things like anti money laundering and things like that where you've got literally millions of transactions and you know that they, they're all, you know a lot of those processes are quite clever or the people that run those places are quite clever and the fact that they, they do certain things that you know we've all seen like the kind of the scams where you know somebody has a card number and like a 99p Kindle book or $0.99 Kindle book comes through and then it turns into a you know, $5, five pound something and then it goes, you know, increases, increases to see you know, whether you've noticed or not. I think it's, it's about bringing that kind of logic to, you know, these large data sets. I mean, again, you start looking at the cloud. I mean, the cloud is a bit, in some ways is a, is a slight bugbear of mine in the fact that everyone, I say everyone, so many people focus on the cloud being the solution to everything and yet there is so many organizations that come into. Your point about the kind of ELT side of things. So many organizations we're going to go cloud first, so we're going to move everything to the cloud. And what they're doing is that can be driven by the bad experiences of data lakes and things, where Hadoop is going to be our answer to everything. Move everything into Hadoop pretty quickly, then it gets there and it turns into a mess. Well, the solution is we're going to move everything out of the data lake and on Prem now. So we've got two sources, way more data and we're going to all move to the cloud. And again, it's that process of you just taking your, you know, this messy data warehouse and putting it in somebody else's data warehouse in somebody else's data center now and it becomes way, way more complicated. And I think having the focus on the metadata and the governance around it with anything relating to data is something that I think got quite lost with 2017, 2018. Everyone focused on data catalogs, data catalogs, the solution to everything. And, but again, and I spent time in a startup for data cataloging, so experienced this kind of firsthand. It was all about kind of AI and ML and you know, being, you know, it's the best thing in the world. It's going to help you do everything. You know, I had a kettle at the time to make tea and coffee with that had on the box. It's an AIML kettle, you know, and it's like all it did is it had a, an app and it. When you came in your front door and you connected to your WI fi, it basically turned the kettle. Yeah, I mean that's not AI or, you know, machine learning and it, but it's. The metadata side of things is actually so underlooked. And I think a lot of that came from the fatigue of these catalogues. But kind of where I'm going to, I'm going around the houses a little bit here, but where I'm getting to with this is, you know, you look at these outliers of data and so many organizations starting to look at things like synthetic data and they focus primarily on the core data and they kind of don't want the outliers. But in reality you need to get like the shape of the data that includes the outliers and everything else of it to make it relevant. And again, if you start to kind of processing data and working with data and you don't have the understanding of those things outside the ordinary, then you're not making as informed decisions as you can. And I think the only way to track that sort of stuff is with increased use of metadata. You know, kind of maybe looking at some of the ways that we approach metadata, which is you can have business context of metadata, you can have pure technical metadata and it's about leveraging the right levels of each so you're not swamped, so you're not overloaded, but it gives you enough for the use case. And again, you know, if we're kind of looking at that observability and seeing know, kind of using metadata to drive that, that's, that's kind of a something that isn't really used as much as it could be, you know, in the past and at the moment for sure. [00:21:13] Speaker A: Yeah, I think we've struggled with that. I mean, you know, throughout my career working in data warehousing, you think about like Bill Immond's corporate information factory, it had stuff in there on metadata and there was the, the metadata warehouse and all sorts of things. But harnessing it has been a problem. Harnessing it. Well and now with, you know, AI and gen AI, I think it's become even more critical, you know, machine learning. I was thinking about this this morning. It's like the machine learning, okay. It means the machine, the computer is learning from what it's learning from the data we give it. Right? [00:21:58] Speaker B: Yeah. And again you look at things like ChatGPT and that kind of stuff. It's based on like a few years old set of data, not the most current data. And again we deal with, we're kind of increasingly kind of more AI driven projects now like gen AI and you know, it's at the forefront of pretty much every financial organization. And we always come back to the, you know, the kind of, the first point is, okay, so how well do you understand your data? Because if you don't, if you don't understand your data, when you start pushing into the, into where, you know, kind of AI and gen AI can really help you. Unless you have the yardstick, you then not quite sure what it's, whether it's giving you the right information or to what level of the right Information. And of course again we focus on financial services. So the majority of that information is numbers. And a kind of a six digit number could be my date of birth, so fourth, the first 80, my date of birth can also be a sort code for a bank in the uk. It could be an invoice number, it could be a product code, it could be a balance, it could be, you know, so it's all of these things about understanding the context of that number and kind of where that can, that can truly help. And I think that's, you know, a lot of the projects are kind of pushing ahead with the technology and the flashy and the wizzy stuff, but they do need to have that grounding with actually looking at the context of the data and leveraging the metadata you have. And some of those things may be, you know, by all means, work out what you can do in parallel, but if you haven't got that good understanding of your data, you need to start that right now and you need to set that as a foundation to everything you're doing because otherwise you are. [00:23:56] Speaker A: And I think I'm seeing because of that there's a increased focus, push maybe towards data management fundamentals. Because what you're talking about ostensibly is the data model. It's like if you've got a data lake without the kinds of metadata that you need, and specifically business metadata, what does that data mean? Like the six digits, what does that mean? And AI can't figure that out for you without additional information. Yeah, we go back into the early days of Hadoop and even the beginning with the cloud. I heard a lot of people going, oh yeah, we don't need a data model anymore because it's going to slow us down. Even inside of Snowflake. When I first went to work for Snowflake, I was like, oh, we don't need to worry about the data model because the data model is all about performance tuning. And it's like, well, no, it's not. And like you said, the WYSIWYG things, you get this really cool tool that can process the data really, really fast. So we don't have to worry about the organization of the data and the structure of the data because it doesn't matter anymore. We don't have to have good schemas and indexes. We can do schema unread and just process all the data and if it's too slow, we just throw some more compute at it. Well, now we've gone to the scale of, okay, you really can't do that. [00:25:19] Speaker B: Yeah. [00:25:20] Speaker A: And then, and then you have the AI on the other side. Like I said, we're trying to feed machine learning algorithms to build AIs. How much like you said, chat GPT, that data is what, a couple of years old? And in part it's because of how much power and energy it takes to actually run those algorithms to update the results. Right. [00:25:42] Speaker B: So yeah, and here's a question that, you know, that I really, I tend to ask people a lot and to try and make people think is that, you know, do you, do you do things like boil the ocean or do you focus on, you know, kind of focus on tangible, the kind of outputs that help you immediately. And the answer to that is kind of both because what you can do is, you know, if you. Again, data cataloging, we used to call it the Death Star project because it like never finished. Right. And it's, but that's because you're looking at it from a non data person view. You kind of like, okay, I start cataloguing on Monday, the 1st of January and I want to finish cataloging on, you know, December 31st. No, that's, that's not the solution because between the thing that you cataloged on the 1st of January will be out of date pretty much straight away. And what do you, what, what do you decide is that source of truth? You know, so it's about doing things in a kind of an incremental way that's manageable and focuses on those tangible outsets while building this entirety of data that increases your confidence. So when you start off, your confidence level is like 100% of us never. 100%, let's say 95%. This bit of data is kind of accurate. By the time you get to the end of the year, that confidence has gone down, but this other data over here is increased. And that's why we always talk about like the shape of that data is really important and it's understanding that. So it's like very much chicken and egg with the kind of, the metadata side of it. But you know, to quote somebody, I don't know who it was that said it was that, you know, unless you start, you never finish. And I think, and I think it's the kind of, the metadata part is, you know, pick what you can really put context to and what you can really provide those tangible value to and then build on that. And it's still better than doing nothing. And it, but it's not, you know, you've still got that gap in between to do everything right. [00:28:08] Speaker A: So it's a little bit of the, you know, take that kind of out of the Agile mindset, right? [00:28:13] Speaker B: Yeah. [00:28:14] Speaker A: Doing things incrementally, but you still, even in Agile and people seem to miss this in Agile you still needed a high level architecture and plan for where you're trying to go. So you have to have the big picture, otherwise you don't know if you're building the right thing. Right. Even in your sprint, in your little iteration. So it does require both ends, a top down and a bottom up approach, but prioritized according to real business cases, real business needs that are going to deliver value to the business in a short time in order to remain competitive. [00:28:50] Speaker B: And I think that's kind of again, when we look at things like frameworks, I mean it's happening more and more frequently where somebody turns around and says this project is kind of agile, but it's got a bit of a waterfall methodology to it as well. It's kind of a combination and I think, you know, kind of one of the things, you know, forward view is relatively small. There's about, you know, kind of 75, 80 of us and we have to focus on something that is, you know, we're, we're not Accenture. So you know, a lot of these large organizations will sit and look at, you know, well, we've got Accenture over here or, or Capgemini or these big consultancies. What do you kind of do that's different? And I think from our perspective it's about getting the business to drive the technology rather than the technology to drive the business. And I think that's something that certainly for many years is what I've kind of focused on. And I think that's really important. And that's kind of again, kind of what I like about the seven pillars is the way it's giving people that point of saying, you know, the technical capability is there. There's so many, you know, great tools that can do what you need to do. But if you have the business problem and the business challenge and kind of the outcome drive the technology, you have a much more capable solution that is actually going to get it, probably get you there cheaper and it will probably get you there more efficiently. Yeah. So I think that's, yeah, definitely a way to think about things for sure. [00:30:34] Speaker A: Exactly. Yeah, no, it's the business focus and that, you know, one of the seven pillars is collaboration. And that's really, yeah. It's not just it off in a corner using cool tools to build something. It has to be done in collaboration. In order to get the right result for the business. Really? [00:30:50] Speaker B: Yeah, yeah. [00:30:51] Speaker A: And that's whether it's AI or ML or just regular data warehousing, we've got to do it that way. Yeah, well, unfortunately we're like up against it already. Time went fast. What's coming up next for you? I know you guys were acquired. Anything on that that you want to share with folks before we go? [00:31:15] Speaker B: Yeah, I mean, I think we're really excited to kind of announce. It was about a month ago now. We got acquired by our friends Nagaro. We partnered with them for a couple of years and I mentioned that we're quite a small organization. We're quite focused in what we do, which is financial services. These guys, 18,000 people, 37 countries. It gives us the ability to really scale and take some of the kind of boutique and niche things that we focus on and allowing to kind of really, really push on. So we're super excited to kind of bring more of what we do for our customers, but to kind of bigger scale. So I think there's a link about our acquisition. [00:32:08] Speaker A: Yeah, QR code there on the screen. Any events or meetups or anything you're going to be at in the near future? [00:32:16] Speaker B: I think I'm looking forward to a couple of weeks off coming up towards Christmas. But then in the new year, I think we're, we're planning with Nagaro to be in New York for the. I think it's nro, National Retail kind of expo, whatever you want to call it over in dc. That's been beginning of January. So Nagaro are there and they've got a presence in a booth. So if anybody is going to, that, you know, feel free to drop in on, on the, on the, the booth there. Other than that, we've. I've got so much going on in the, in the new year. I haven't even kind of worked out whether I'm coming or going yet. But that's, that's the first day. I think it's. [00:32:58] Speaker A: Yeah, it's hard to believe we're already in December, isn't it? [00:33:01] Speaker B: Somebody, somebody said to me before actually they were like, oh, we're doing something. It was in January and it's like, yeah, but that's a bit of time off. And I was thinking it's literally two weeks away when you think about it, because you've got this next week and the week after and then everyone is. [00:33:18] Speaker A: Then it's holiday. [00:33:19] Speaker B: Holiday. Yeah. And then it's like straight into January. So it's a lot closer than we think. [00:33:25] Speaker A: Yep. And then we've got your QR code there for your LinkedIn. If people want to connect with you via LinkedIn and follow you, see what's happening out there in the world of forward view. And thanks. I think that'll be great. It's very interesting what's happening. Right. Things are going to continue to change well into 2025 and probably accelerate to the point that it's going to make us all crazy again. [00:33:52] Speaker B: Yeah, probably. Yeah, absolutely. No, I'm looking forward to 2025. I think it's kind of, as I say, the change is more about the business driving that technology and I think it resonates with more and more people. So I think. [00:34:05] Speaker A: Yeah. Well, thanks for joining me today, Carl, and for your insights and being a guest. This has been great. Thanks to everybody else who's online joining us or watching the replays. Be sure to join me again in two weeks. My guest is going to be independent consultant, Data Vault, Data Ops and Data Mesh expert, my buddy, Paul Rankin. As always, be sure to like the replays from today's show and tell your friends about the True DataOps podcast. And don't forget to go to truedataops.org and subscribe to the podcast so you don't miss any future episodes. Until next time, this is Kent Graziano, the Data Warrior, signing off. For now.

Carl Ferrer - #TrueDataOps Podcast Ep. 43

Show Notes

Episode Transcript

Other Episodes

Matt Aslett - #TrueDataOps Podcast Ep. 46

Koen Verheyen - #TrueDataOps Podcast Ep.34

Justin Mullen - #TrueDataOps Podcast Ep.5