Episode 16

May 15, 2024

00:30:32

Koen Verheyen - #TrueDataOps Podcast Ep.34

Hosted by

Kent Graziano
Koen Verheyen - #TrueDataOps Podcast Ep.34
#TrueDataOps
Koen Verheyen - #TrueDataOps Podcast Ep.34

May 15 2024 | 00:30:32

/

Show Notes

View Full Transcript

Episode Transcript

[00:00:04] Speaker A: All right. Hey, everybody. Welcome to our show, the True Data Ops podcast. I'm your host, Kent Graziano, the Data warrior. Each of our episodes, we try to bring you all things covering the dataops world and the people that are making it what it is today. If you've not done so already, be sure to go look up and subscribe to the Dataops live YouTube channel, because that's where you're going to find all the recordings after we're done. So if you've missed any of our past episodes, and we're coming up near the end of season two, you can find all of those on the YouTube channel, and you can catch up. Now, if you really want to stay up with what we're doing, you can go to truedataops.org and subscribe to the podcast, and then you'll get proactive notifications. Now, today, this is starting to become an annual thing. We are actually live in person in the same room in Stowe, Vermont, at the Stowflake Lodge, at the worldwide Data Vault consortium, actually the 10th anniversary. And today I have with me kun Burhan, who's the head of partner solutions at Vault Speed. So welcome to the show. [00:01:13] Speaker B: Thanks for having me. [00:01:14] Speaker A: Yeah, we don't usually get to do this sort of thing, but it's very exciting opportunity to do this and have somebody in person in the same room. And, you know, it's an actual fireside chat. There's a real life fireplace back here, but we decided not to turn it on because it's like, just a little bit too warm in here if we're sitting here for 30 minutes or so. So if you could tell the viewers a little bit about your journey and the data world and data management and what you do at vault speed. [00:01:48] Speaker B: Yeah. So started back in early two thousands. I started at Oracle doing databases and actually started my first data warehouse project in April 2001, together with our current CTO of fault speed, Dirich Vermeer. So basically had the journey, my complete professional journey was together with Derek, followed him. He took me along on his professional journey. And so in 20, 12, 13, we started doing data vault on our projects. So we did data warehouse projects like the old style gimbal, whatever the customer wanted. And so we started doing data vault modeling on our project, started building a framework to accelerate. We called it our accelerator to start generating based on Oracle, because back in the days, we were a pure oracle consulting shop, and so started building a framework. I participated in building that framework, and then we decided to create the product company vault speed. And there I'm helping out the CTO on the vision, on the product, discussions on what to do. And my main role is daily conversations with our partners all over the world to help them on the technical side. So I'm part of the partner team helping out the partners. We have the business development managers for the partners, so I'm focusing on the technical stuff. My 20 years of experience in consulting helps me understand. I know the pain, I know what they need to do on a project, so I'm helping them out in that way. [00:03:32] Speaker A: Yeah, that's awesome. So Vaultsfeed is actually partnered with Dataops Live. That was a partnership I think you guys formed last year, if I remember correctly. We talked about it here, probably at the conference. So tell me a little bit about that partnership and what you see as the value to your joint customers of marrying data vault methodology, vault speed and the data ops live platform for Snowflake. [00:03:59] Speaker B: Yeah, so we encountered dataops live as a solution. We saw that passing by and we thought, okay, the way the data vault is being built, we can be an added value for that because there was with DBT. So it was the building of the data vault model, it was like manual. So we really saw, if you have the data ops live diagram picture in the middle, the real build of the data vault, we saw a fit and we started the discussion with the people of data ops live. And so we have a joint customer, which is good. They are building, they are using fault speed to build the data world model and they are using data ops live to do the management of all the environments, everything that faultspeed doesn't do. And that is where we saw a good fit because the whole management of environments, the Ci CD pipelines, that is something explicitly what Faultspeed doesn't do, a choice. We focus on the real data vault generation, the real building, and not on the CICD pipeline management. And that is where the partnership is a very good fit. [00:05:13] Speaker A: Yeah, and so I mean you're really, you're a data vault automation specialist, but as you said, it's kind of the core of data vault and building models and loading in the load routines and things like that. So from that perspective, how do you define data ops really, and where does that really fit into the delivery of a full data vault solution? [00:05:39] Speaker B: Building the Datavault model is just at the end, the tiny bit. If you use a good automation solution, building the data vault is step one. And you have so much around managing the data vault, managing the different environments. And that is where we, we as fault speed, see some of our customers who are not using a, not having experience of data ops or a good DevOps operation, they struggle, they don't know how to manage the dev test acceptance production lifecycle. And that is where we see that a customer needs something like that to do that in a proper way. Otherwise your Datavault project is at risk. You will have a good data vault model in a few days, basically. But then. [00:06:32] Speaker A: How do you operationalize it? That's why it's called data ops, right? Yeah. [00:06:36] Speaker B: Then you're stuck because you might don't know, how do I move that to test? How do I move it to production? [00:06:43] Speaker A: Right? Yeah, yeah, yeah. And building those environments, managing those environments, doing version control, basically as you try to testing as well. [00:06:56] Speaker B: A lot of customers we see, they have the experience of doing that for their operational system, but doing the same thing for data integration platform is, it's a different thing. You need to take into account different elements, which if you take a look at the data ops architecture, it's completely covered and the customer needs that. [00:07:21] Speaker A: Yeah. So it probably helps with a more consistent delivery, keep things, keep the ship running as it were, once you get it started. And as you know, it's iterative, it's a continuous evolution that's happening and your product helps them with the evolution of the data all itself. So we get a new source system, we get a new business unit, we get a new business problem, we have to design extensions to the data vault. Right. Vault speed is going to help them do that. But still, how do you make that work within the context of what you've already deployed? [00:08:01] Speaker B: Yep, indeed. [00:08:02] Speaker A: So, data products, another really, really hot topic these days. How do you see vault speed working in conjunction with that concept and helping people deliver data products vis a vis the data vault world data vault methodology, and really kind of do that at scale. [00:08:24] Speaker B: What I see is that when people were doing manual data vault projects or old style, not data vault, data warehouse projects, it basically took them six months, eight months, nine months before they could even imagine to deliver something. So if you then struggle for a week to deliver something, then it's only a week. On nine months, it's not too bad that you lose because you don't have a proper release management and all the controls around it. But with the automation, with a data vault, like I said, you basically, after two, three days you could do your first release. And if you then struggle a week, you basically triple your throughput time in your project. And that you don't want to have, you want, you want to eliminate the time you lose and having the good structures there that you can use a proper data ops so that you don't lose the accelerated time you had with the automation of your data mode. [00:09:25] Speaker A: Right. Effectively, you know, the example you just gave, you basically just tripled your time to value, which is what we're trying to not do. Right. Taking longer than it needs to because you don't have the, the machinery in place to get it deployed, tested. I'll get that all done properly, is basically you've got the prototype built, but you can't get it out there for the business to use fast enough. [00:09:53] Speaker B: Yeah. And then you have your first beta version of your data vault without many business things, business logic implemented, but then you're struggling to get it to the business to generate that value. And that is where you then lose a lot of time in comparison to your development time. And then your time to business value grows exponentially and that you want to avoid that. [00:10:17] Speaker A: Yeah. So the next question is, an obvious answer probably for you, is that do you think it's really possible for people to try to scale and deliver this kind of value and data products and solutions at scale, that they're not adopting some sort of agile, automated and really orchestrated data ops approach? [00:10:41] Speaker B: So you need that. If you're doing data vault, you're doing automation. If you don't control your environments, your testing all those things, I'm going to say it's useless. But your time to value is so much longer that the investment in doing, using products like faultspeed on your projects. If you want to really consume that value, you really need to have the proper data ops structure. And we have various customers with some data ops activity, but never at 100%. And that is where they struggle. And then we see them struggling, then we get like, hey, automation is very good, but we can always go back to the seven pillars of data ops and say, yeah, but you're struggling not because of fault speed, you're struggling because you don't manage your data quality, you're not managing your environments, those kind of things. There's a discussion that we then have with our customers. [00:11:52] Speaker A: How often do you see customers that, you know, we'll talk specifically snowflake space right now, that they're trying to do this stuff manually way too often. [00:12:03] Speaker B: Way too often they, and then they try left and right, some frameworks or things, and then they have ids and then they start googling and then they find some stuff and then they think they find the holy grail, but at the end they are not covering the whole spectrum in the correct way. And then basically, if you don't cover all the seven pillars, then in some degree, then, yeah, the pillar is a foundation for the, for everything. If one of the seven pillars is like at ground zero, then everything falls down. [00:12:39] Speaker A: The roof falls down. Yeah. If you take out one of the pillars, architecturally, it doesn't hold up because I think that we see a lot of people and back to even the beginnings of the thought process of developing the seven pillars and the beginnings of developing even the product adopts live. Everyone seemed to be focused almost exclusively on CI CD. It's like how fast? Continuous integration, continuous development. And like you said, you could google it, you might find some open source packages that kind of sort of helped, but you still had to do a lot of engineering yourself around that. [00:13:18] Speaker B: Yeah, indeed. And then that is what we see with our customers, that they have some kind of Ci CD pipeline management, but the enterprise customers usually that is like so big, so robust, so difficult to change. And at the end, like I said before, oriented on their operational systems and not on their data integration platforms with their specific needs of testing of data, the environments, where do you get your data from all those things? They then never have really thought about it. To really manage that in a proper way on those type of data integration platforms. [00:13:59] Speaker A: That's actually a really good point, because when we started off down this journey of data ops, a lot of people just, oh, it's DevOps for data. Well, okay, but what did that really mean? And like you said, you know, doing DevOps on software development projects for specific operational systems is different than managing the lifecycle of data in an analytics platform and implies a lot of different things, which is where we've come to this sort of evolution of, of data ops instead. And it sounds like a lot of customers, and I think this is the journey, they think that they know how to do it. But again, without referencing the seven pillars, which put a plug out there, again for truedataops.org, where we articulate the seven pillars, people don't think about it necessarily in that context. They get stuck kind of on the Ci CD side. Right? [00:15:01] Speaker B: Yeah, they seem to focus, a lot of people seem to focus purely on that. [00:15:05] Speaker A: And then, and using GitHub to put their code in, right? Yeah, right. They do the version control there. So from an automation perspective, I mean, again, this is like a loaded question for you because you're an automation vendor, and maybe you can talk about the struggles you've seen with organizations that don't adopt automation, especially from a data vault perspective, that are trying to build their data vault manually in Snowflake. I mean, we're talking about the seven pillars and, okay, manually managing the environment in Snowflake, manually doing their zero copy clones, things like that, that's one thing. But then trying to manually build data products instead of using automation. [00:15:51] Speaker B: Yeah, these days we see a lot of customers where we talked to like a year ago, two years ago, and they said, no, we're going to do it manually. They start coming back to us, they start really seeing the pain, really seeing the value of automation. And that's where we see them struggling doing things manually. There are also a lot of small, homemade little frameworks that do very specific things, but very specific things to the needs that they currently have. And if they need to change something like some kind of table setting when deploying the, then they need to rewrite a lot of code and something that we see now, customers tending to get rid of those home grown. You always have that one person in your data team that loves that kind of thing to build that little automation thing that they need at that time. But at the end that gets into. [00:17:00] Speaker A: The standard buy versus build conversation and really kind of the total cost of ownership. Because what you're describing is basically they're running into technical debt on the little piece of software automation that they wrote and then, well, their one engineer who wrote that has to go back and fix it, and that if they're working on that, they can't be working on delivering value to the business now. [00:17:23] Speaker B: Yeah, indeed. And that's now the typical thing that we see now with our prospect customers. They return from that. We're going to build it ourselves part of a year, two years ago. [00:17:36] Speaker A: So do you have any advice for, you know, someone who's in an organization and, you know, kind of gets it, maybe looked at the seven, seven pillars and realizes that probably building the stuff themselves is not necessarily the best approach. How, what can they say? How can they convince other people on their team and maybe their manager? Is there, is there a couple things that they could maybe point out that would help them sell the idea that, hey, we need to go find some automation that's managed by somebody else that's built and already tested. [00:18:13] Speaker B: So if you go back to the seven pillars, if you use a software package like fault speed, then three of the seven pillars are like 100% clear. Everything is done and dusted. It's clear, everything is tested. You can count that it works, it does what it, it needs to do based on your complete setup and then you have, people don't need to worry about those three pillars and then they can focus on the four other pillars. And you might see that certainly the enterprise customers, some of these pillars are like semi filled because they are doing some things in those directions like the environment management and dev test acceptance production. They do something they need to follow certain rules, they have their complete framework built into to create new platforms and stuff like that. But nevertheless for the green field customers that are like starting up and want to have a very fast business value generation, then if you're using automation to build, build the three pillars, use dataobs live to build the four pillars in a matter of weeks and then together after a few weeks you have business value for me. Greenfield customers, they need software to support them in the build phase and in the deployment consumption phase. The other four pillars. [00:19:50] Speaker A: Yeah. I'm still amazed and I heard some stories here this week about how people are still doing things. And I keep asking, it's 2024 and I'm hearing people still doing things the way we did it, 1020 and in some cases almost 30 years ago and trying to manage these things. And if we think about it, it's like the size of the data and the variety of data that we were dealing with say 20 and 30 years ago is nothing in comparison to what organizations are having to do today. And then you throw in things like Genai on top of it and unstructured data and images and text and all of that. And it baffles me constantly how people can continue to go down this old path when I mean like you said, you started in what, 2001? And very rapidly even in your practice working at Oracle, it's like we need to build some stuff to, to augment what we're doing to make this better. And so you've been on this journey for over two decades now yourself. [00:20:56] Speaker B: I'm getting old. [00:20:57] Speaker A: No, no, I've been on it. I'm approaching four. So yeah, you got a ways to go, but yeah, at least you started, you know, into the current century, right? As I saw, saw a meme recently that says, oh don't worry about me, I'm from the 19 hundreds, right? It's like, yeah, that was last century, wasn't it? But I'm still seeing some of those problems and I haven't figured out how, you know, what's the best way to try to explain to people is like you need to evolve, right, and evolve your systems. I think the seven pillars are definitely a great way to try to articulate what we should be doing. But getting people on board with it is difficult. Difficult, yeah. [00:21:37] Speaker B: And certainly enterprise customers feeling the need to change their approach, change their way of thinking, is sometimes hard. [00:21:48] Speaker A: Yeah. And that brings me to the people, processes and technology. Question is, you know, what have you seen? What's the biggest barrier to success for these kinds of organizations? [00:21:59] Speaker B: People. [00:22:00] Speaker A: It's the people. [00:22:01] Speaker B: It's always the people. A fool with a tool still remains a fool. [00:22:06] Speaker A: Oh, man. Yeah, I've heard that one on and on for years. Yeah. [00:22:11] Speaker B: You can build, using vault speed, you can build a very nice data vault. But if you're starting with a wrong conceptual model, if you don't, if you build a conceptual model, I know people are trying to do stuff with AI to help analyze the business, structuring, generating a conceptual model. But up to now, you still need somebody to build that conceptual model so that you can base your data vault on top of that. Because if that person using a tool like fault speed, is generating a data vault model that doesn't reflect the business, it is useless. And you have your technology, your processes can be very good. But if the person that is using all this magnificent software and packages, then you have absolutely zero business value at the end. [00:23:10] Speaker A: Yeah. So they've, from a process perspective, they've dotted the I's and crossed the t's. But the sentence they wrote makes absolutely no sense. And then we were talking about that earlier in the conference with Genai, and what they call hallucinations, that sometimes you put something into chat GPT and what comes out isn't even remotely related to the question that you asked. And Heli was even talking about the acid to do some translations, and it actually invented a new language. It was supposed to be translating something into Finnish, and in the past had been able to do it. And it's like, okay, that's finnish language. It makes perfect sense. And then the next time it comes out and it's not English, it's not Finnish, it's like, invented a completely new language that makes no sense to anybody, and only the only the AI actually might understand. If you ask the AI to translate, the AI might not even be able to translate it even though it generated it. But yeah, it looks like you're doing the right thing, but no, yeah, yeah. And I think there's a lot of people, I'm getting this question all the time. It's like, well, can't we have AI generate the model? And I think what you said is like, not yet, no, you gotta have the right information. [00:24:32] Speaker B: I had some good conversations because for our customers who are using voltspeed, building that conceptual model, because you're using automation, building the conceptual model, that is the struggle that becomes the longest process of your entire process. So I've had some good conversations. People are trying to make things that, based on AI, that can help the analyst to detect something that is at the customer based on all the data that the customer provides. And yeah, that will accelerate the project. And perhaps in two, three years time we'll be sitting here and we're saying, yes, we have this framework and it can generate, based on my entire sharepoint content. [00:25:25] Speaker A: Right. [00:25:25] Speaker B: Generates my conceptual model of my company. [00:25:29] Speaker A: Yeah, well, I think probably we talked about this at the conference a couple of times too, is you really need the business ontology, the taxonomies, the semantics of the business somewhere that the AI can consume it and understand it in order to even begin to kind of put out a valid business conceptual model. But still, I think going to take the people to review it and say, yes, that's a correct interpretation. [00:26:01] Speaker B: Yeah. And what we sometimes do you have on the source platform, then you have some comments on a table, and the table is called XYZ. And in the comments of the table it says customers, you can imagine to write something to consume that the metadata of the physical source system to then propose to the, to the user of that platform. This table might be your customer data. Yeah, but yeah, there are so many varieties of source platforms, way of documentation, very difficult to, I think to like build something that will, I can be productized and be used in like everywhere. Probably if you have, I would probably if I was a customer and I had the need of that and you have that type of documentation in your source system, I would try to consume it, but then it's your single usage thing and building it out and to say we're going to cover all types of platforms, all types of elements, then it will be difficult to productize that and build that. I think a challenge for the people that are like focusing on that. [00:27:20] Speaker A: Yeah, it's definitely a challenge. Like I said, this time next year might be a different conversation. I guess we'll just have to wait and see. [00:27:27] Speaker B: Yeah. [00:27:28] Speaker A: So what's next for you and for vault speed? You got any more shows coming up here after the data vault conference? [00:27:36] Speaker B: From a vault speed perspective? We have a lot of things popping up. It's really event season. Last week we were people of our team, they were in Stockholm. This week we are here. There are several shows in Germany. We have events, big snowflake the week after Snowflake summit. Databricks the same venue week after week. [00:28:02] Speaker A: Well, this is not last year, the Snowflake summit and the databricks conference were the same week. Vendors were like, they were having to split their teams at least this week, two in a row. But unfortunately, that means some folks are going to end up there for two weeks, right? [00:28:18] Speaker B: Yeah. So then later this year, there are also a lot of events. It's always a puzzle at fault speed, who will do what at what time, because things change and focus changes. So we need to. But yeah, I'll be, I'm available for the team to go wherever I can assist. Also for me personally as a partner, solution architect, being at a show where there's more sis walking around and having conversations, for me personally, more interesting than a customer focused event. So that's also an evaluation that we make internally of who does what focus do we lay where. [00:29:01] Speaker A: Yeah. So what's the best way for folks to connect with you? [00:29:05] Speaker B: LinkedIn, of course, Kuhn Verhe. If you search Kuhn Verhe vault speeds, you will get the connection there and then reach out to LinkedIn. Then we can connect through mail and then we can have conversations, meetings, whatever. [00:29:22] Speaker A: Awesome. All right, well, thank you for being my guest today here in Vermont. I'm glad we were able to do this in person by the non burning fireside. Luckily, yeah. Thanks everyone out there for watching today and make sure to join me again in two weeks. I'm going to have the co founders of Coalesce, IO Armon and Satish are going to be joining me to discuss what they're doing at Coalesce and their partnership, also with Dataops Live. And as always, make sure to like and repost the replays of our podcast here today and tell all your friends about the true data ops podcast so they don't miss out on this content either. And as I said before, you can always go to truedataops.org and subscribe so that you get the notifications for this podcast coming in the future. So again, thanks everybody. And we're signing off now. Live from Stowe, Vermont, this is Kent Graziano, the data warrior. See you next time. [00:30:27] Speaker B: See you.

Other Episodes