Episode Transcript
[00:00:00] Speaker A: Foreign.
[00:00:04] Speaker B: Welcome to this episode of our show, True DataOps. I'm your host, Kent Graziano, the Data Warrior. In each episode, we bring you a podcast discussing the world of DataOps with people that are really making DataOps what it is today. So be sure to look up and subscribe to DataOps Live's YouTube channel, because that's where you're going to find all the recordings from our past episodes. If you missed any of the prior episodes. Great time to get caught up here while we're in the middle of the season now. Better yet, you can go to truedataops.org and subscribe to the podcast so you don't miss any future episodes.
Now, my guest today is author, thought leader, advisor, agile data modeler and methodologist, and a renowned international keynote speaker, Scott Ambler. He's currently working on a new book tentatively titled Continuous Data Enabling Data Driven Decision Making and Corporate AI.
Now, I recently had the pleasure. I've known Scott for quite a few years because he's one of the few other people in the world that talked about agile and data in the same sentence, which people thought we were nuts when that started. But I recently had the pleasure of attending a workshop that he gave on continuous data warehousing, and in it he talked about one of my favorite topics, data ops. So I figured why not have Scott on the show?
Should be a very interesting discussion. Well, welcome to the show, Scott. Good to see you again.
[00:01:27] Speaker A: Yeah, great to see you, Kent. Thanks for having me on. It's always a pleasure.
[00:01:32] Speaker B: So for the folks who don't know you, your extensive background in all of that, would you care to give us a little bit of an overview of your career in agile, data management, modeling and all the things that you've been into, some, some of the books you've written and the methodologies that you've even pioneered?
[00:01:51] Speaker A: Yeah, sure. So, yeah, I'm one of those annoying old guys that, you know, was doing data warehousing long before it was called data warehousing. I'm sure you've heard that story a few times. But on, you know, it's. I'm one of those guys that did that. But in the 90s I was, I got interested in modeling and methodology and all that good sort of stuff and, and at the time did a lot of object orientation, did a lot of heavier like CMM and eventually CMMI compliance stuff, was interested in uml, did some of the original writings there. Actually, I wrote the first published article on UML for Object Magazine.
[00:02:29] Speaker B: I didn't know that yeah, yeah, it's.
[00:02:31] Speaker A: It'S, you know, I even beat the UML guys out on that one, but.
And yeah, whoop, you do. It's the real thing. But yeah, you did work in the unified process. But then when the, the agile movement came along, I knew most of the people that were actively involved back in the day and it spoke to me and I was already doing some extreme programming stuff. I'd taken a workshop with Kent back a couple of years before the manifesto was written and was working on what became the Azure modeling methodology. This lightweight approach to modeling and documentation, which at the time was radical. It was all case tools and heavy modeling up front and all this, you know, theoretical nonsense that just didn't work well.
But you know, the case tool vendors were making gobs of money, so that was what really counted. And I came along and said, you know what, most modeling seems to be getting done on whiteboards and with sticky notes and paper and all good sort of stuff. And you know, the fancy tools are great and have their place, but let's actually talk about what we're doing in practice and then how it fits in with, you know, everything else that was the focus of Agile modeling. But then in the early days of Agile, a bunch of us got together and we started talking about, well, what about this data stuff? Because if you can't do data in an agile manner, then agile software development is going to tank because we're all working with data, like it or not. So we started working through what some of the implications of that, and we came up with valid, these coherent strategies around testing and database refactoring, which was like totally radical back in the day.
And the traditional data community didn't want to hear a thing about that. We just got hammered constantly by these theory knobs who say, oh, this can't be done, it's not possible. And we'd be pointing at, well, wait a minute, database refactoring is possible, highly desirable, and here's how you do it and here's the source code to do it if you have the ability to type in half a page of code from a book.
And we had these just insane arguments over their theory versus reality. Anyways, that was okay. And then I started working on. I was working at organizations around the world and worked for IBM for a few years. And at IBM I working with a bunch of people, but particularly, particularly Mark Lyons, to develop what became the disponential toolkit, which was all about how do we do real agile development at scale in these complex scenarios that we actually deal with as opposed to these simplistic fantasies that a lot of the gurus were talking about. And then, you know, and then. And data was of course an important aspect of that, as was modeling, documentation and governance and many other swear words for the agile folks. Then I helped organizations do this stuff.
When we had the Dispenagile company which focused on consulting and training, I did a lot of the data stuff. Helping organizations do data warehousing and apply agile techniques to the data world and all that good stuff. That's basically what I've been working on since.
Worked for PMI for a few years helping them roll out da.
And then after leaving pmi, I went to, got a degree in AI. So I went back to school, got a master's degree. So I just graduated a while or I just finished it a while ago. I graduate this summer, you know, to be, you know, above board about that. But I've passed, you know, I, I've got my marks. Marks are in hand.
That's okay. So, so it's all good. And yeah, so I've been, I've gone back to, to my, my roots and have been focusing on originally agile data warehousing a few years ago, but then over time as I've been teaching it and helping organizations adopt it and figure it out, I started teaching it.
The way I was teaching was that we started out with traditional is not working so well, here's how you do Agile. And then we watch how agile doesn't work out so well either because there's a lot of good things about it, but there's a few real serious challenges with it. And then the course ended with well, what you really need is a continuous approach based on more of a Kanban product oriented continuous delivery type of a, like a data ops type of a strategy rather than, you know, the SCRUM based agile thing that people are sort of force fitting onto data warehousing. So, so then, so then I eventually came to the conclusion, well, why am I still calling this Agile data warehousing when I'm really getting the, getting people to the point where it's really continuous data warehousing. I'm talking about so and teaching and rightfully so. So anyway, so then, yeah, that's how the continuous data stuff came about.
[00:07:37] Speaker B: Yeah, yeah. No, I remember back in. What was it? It was probably 201112 somewhere in there working on a data vault project here in the Houston area for a big oncology network and we were using JIRA and using a SCRUM approach to that, which Two of us kind of managed that and made sure everything worked. And at one point then the management decided we were going to switch to Kanban. And so we ended up having to roll right over into the Kanban thing. We had already gotten people thinking from a product perspective because we actually had product owners, as Scrum definitely has people doing, and we had actually gotten the business involved. And so that was, wasn't too bad a transition. Actually. It was a little struggle at first. The biggest problem we had was the release manager is that he just did not stay on top of it. I mean, we, we had a backlog of stuff that was ready to go to production that like, never got approved because the guy who had the, the button control to say move that to production in Jira just was not on top of his game. And that made it a little challenging. But it worked okay. And long before I started talking, getting involved in the Data ops movement hadn't really even started back then. That evolved here in the last decade, just like we went from agile software development to applying agile to data. You know, we've got the DevOps thing that in continuous integration, CI CD, all that is now started has turned into Data Ops. Right.
So, you know, we've been looking at that on, on our podcast here for the, the last couple of years and you know, trying to think about, you know, the concepts of true Data Ops and how that's really evolved in the last couple of years and since you've kind of been like right in the middle of it and as I was pleasantly surprised when it had a big slide in your workshop there when we were doing the data vault training back in Orlando in December.
That's really. Things have come a long way and you're helping people learn how to do that, you're learning better ways of working. All of that.
Give us a little feel for your perspective on how this space has really evolved in the last couple of years. And some of this led to your continuous data warehousing concept.
[00:10:09] Speaker A: Well, yeah, so I think things have gotten a lot better. Like, you know, this, you know, the overall movement has been really slow. Like from my point of view, you know, having sort of kick things help kick things off in the early 2000s and here we are almost 25 years later and it's really only been a few years where particularly data ops has sort of taken off. It's been slow, but then again to, you know, stuff happens, I guess. But, but it, it, it's, you know, we've seen this gradual improvement over time and it's hasn't been this insane steamroller that we saw with the rest of the Agile movement, which is sort of crapped out now. But you know, it's been this realistic, you know, let's adopt it and figure it out.
But you know, some of the challenge though has been on the traditional side of things.
It's harsh to say, but we've sort of had to wait for the traditional thought leaders in the data movement to die off and get out of the way.
Thank you for your theory and it was awesome in the 70s, but maybe the 80s, but it really is harmed the data movement, the data community, in many, many ways. Finally, we're seeing more adoption of these agile data and data ops techniques, which is great. And I think a lot of it is reality has put the reality that organizations face and the VUCA world, and I hate using VUCA because it's turned into a buzzword, but the reality is we're getting hammered by volume, we're getting hammered by the incoming data rate, we're getting hammered by change. And we can't go off in a fantasy land like the traditional people want to do and model living hell out of things for months or years until we finally do something right. It's like that guy with his finger on the button you were talking about, that's useless, right? He's a pro, he's a serious problem and I'm sure he would spin a story about quality, but no, you know, if you leave quality to people, you're in trouble. You got to automate the living heck of everything. And, and we're learning that in the data, we're finally learning that in the data space. So it's no longer tolerable for data people to work as slowly and frankly, low quality as we see in the traditionalists want to do. Right? We need better quality, we need better speed, we need better predictability, and the business can't tolerate anything less. So I think a lot of the data people in many organizations seems to, they've been dragged into the 21st century, sometimes fighting and screaming, but then again too, I think a lot of people sort of see like, you know, I'm working with this one team right now and they get it right, like they need help, they need help to pick up the skills and that, but they get it right. They realize they need to improve and they, you know, they can't, you know, release their data warehouse once every six months when they've got, you know, new requests coming in daily. And yet, you know, it's like, well, you Know, you can. We'll put that in the next release for next year. Come on. Right. That's not acceptable.
[00:13:21] Speaker B: No, that's a hard no.
[00:13:23] Speaker A: Yeah, it's a hard no. Yeah, so. So I think reality has forced people to start dealing well with reality and you know, the, the ivory tower traditional fantasy theory world has, is, is done. You know, we're just done. Everybody's done with that, whether, you know, some of the, some of the more traditional people, if they recognize that or not, but we're just done with that to move on. And, and there's, there's good stuff there. Like, don't get me wrong, but you know, you got to pick the nuggets out and, and you know, throw, you know, keep the baby, throw it, the bath water and there's, yeah, a lot of dirty bath water in the traditional world. We just purge it.
[00:14:06] Speaker B: So taking more, a more pragmatic approach and get, you know, getting things done. I mean, that's really the, I mean, back to the, you know, Agile manifesto that was part of. It was like the goal is to deliver stuff that works and that's valuable to the business. And. Yeah, it's hard to believe. Yeah, you said 25 years later, we're still talking about that. I guess I've been lucky in the roles that I've had in the last five to 10 years.
I haven't had to deal with a lot of traditionalists because going over Snowflake, I think by definition you're not dealing with traditionalists anymore because that was, I was early on there in 2015, 2016. People who are adopting McLeod were generally not traditionalists or had thrown some of the traditional theory out already because it didn't work. Otherwise they wouldn't have been willing to try things like Snowflake and Data Vault and the things that you and I have been involved in.
[00:15:04] Speaker A: Yeah, exactly. And I think, you know, in the comments we're seeing, seeing people comment on this and one of the comments is like, practice without theory is rudderless.
[00:15:12] Speaker B: That's awesome, Doug.
[00:15:13] Speaker A: Yeah, that's absolutely true.
So, so we need to. Yeah, so I, I beat up on the theory guys a bit. But.
But yeah, so like the stuff I promote is all based on, is all B. You know, it has its, its foundations in theory, but you know, theory without, but you know, it goes two ways. Right. Theory without pragmatism and, and actual application in, in reality is no good either. Right. And I think that's. We had, I think we just had too much theory and too much wishful thinking. And I think because if you look at traditional practices, which I've done, this was One thing with Dispenagile was we put practices and strategies into context. It wasn't just Agile, it was unfortunately misnamed. But we looked at a lot of traditional practices, put them in a context and defined here's when we'd use this, here's when we would not use this and here's why.
And I think the challenge with traditional practices is that they do in fact work when you're in that a context where you can tolerate working in that way. But they're, you know, but if, if that's not your context and it rarely was, then they fall apart. Which is why we see such huge problems with data quality, for example. Right. So you know, there's a lot of great ideas about how to get better data quality in the traditional world. Well, I'll just let their track record speak for itself. You know it, yeah, great ideas, but they didn't, right, but they didn't reflect the context of, of the situations people actually faced.
Like a multi year, you know, a multi year strategy or strategy that has like a multi year payback is pretty much guaranteed to fail in corporate America. Right. We can't possibly keep focus that long and just, you know, the plug's going to get pulled on that guaranteed. Right. So yeah, yeah, great idea but you know, not going to work, so not going to work in practice. So let's, let's do something that'll, you know, has a better chance of working. Nothing's guaranteed of course.
[00:17:21] Speaker B: So, okay, so from your perspective, can you define, you know, what is DataOps in this context and you know, how important do you think it is today in the data world?
[00:17:32] Speaker A: Yeah, So I think I'll answer the second question. I think it's incredibly important.
And then. So how do you define it?
I look at it as, I think there's two versions of Data Ops.
There's the, the software engineering version of it where it's based. Data Ops is basically the data aspects of DevOps. So if I'm an application developer and I'm building something, yeah, I'm building code and you know, it looks some pretty interface and stuff like that, but there's also data going on. There's data stuff going on too. Right. So in some, so in one respect Data Ops is just merely the, the data aspects of DevOps just like DevSecOps is like the security aspect of DevOps and so on. Right.
But so like that's the one version, the other version of Data Ops, which is where I'm really interested in. Any other one's interesting too. But you know, the. Is where when you're on like a data data initiative, like you're building a data warehouse, you're putting together some sort of data product, then it's, and then it's like mostly this data dominant effort, then suddenly Data Ops is, you know, it's. I hope you're a little more serious about it and because, you know, data. Data is, is your overriding, at least a critical concern, maybe the overriding concern in that. So I really. So, you know, maybe like Data Ops light and Data Ops serious or something, I don't know. But I do look at it in two different aspects because. And I think because. And those are two different communities as well. Right.
So when you're, you know, when you're coaching the Data Ops is just a part of DevOps crew, you're dealing with different issues there. Like you're explaining data stuff to those folks. When you're dealing with data practitioners and bringing Data Ops into a data initiative, you're almost always dealing with software engineering issues. At that point. They've got a handle on the data. Right. They're, you know, they're probably pretty good at that, but they don't know, they don't know how to apply modern software engineering concepts into that space because it's harder. Like this is, I think, and this is part of the message for the application developer, the software engineers, is that doing DevOps for data is significantly harder than just DevOps out of the box. Right. Because data is persistent and you can't break the data or you shouldn't break.
[00:20:07] Speaker B: Yeah, I mean, like applying version control concepts to the data. That's one of the big things that we have run into. Environment management for data is different than you're talking about databases and schemas and do you have a dev version, a QA version, a production version of that? Where do they live? How do you manage it? How do you move things between those environments? It's not just about deploying code.
[00:20:34] Speaker A: Exactly. The fundamental story there is that all of the classic DevOps practices like refactoring and automated testing, continuous integration, continuous deployment, they all work in the data space, but they're all far more complicated because of persistent data. And you've got to understand that new. And it's not just simple, I get, you know, you've got to understand the nuances of that and that. So that's basically, you know, what I end up having to code because there's always, you know, some weird data gotchas that the engineers just have never run into before or they have run into and ignored and didn't understand. They were, you know, in serious trouble.
[00:21:14] Speaker B: So yeah, so it's been four years since we first put up the True DataOps.org site and wrote the Dummy's Guide to Data Ops where we, you know, we came up with the seven pillars of true data ops. So I wanted to ask you, from your perspective on, you know, continuous data warehousing, you know, how does, you know the ideas of the seven pillars and all that really align with, with what you're talking about?
[00:21:41] Speaker A: Yeah, so it's, it, it's pretty good. I, I've got, you know, one of the, the one issue I would have with the seven pillars is the ELT one for a couple reasons. You know, I think, you know, we've learned as a, as a community that lifting and shifting might not be that great of an idea. And sometimes you're stuck, right? Like you gotta lift and shift into your, you know, whatever your environment is for, for quality data processing, you know, and if it's in the cloud, if it's on prem, doesn't really matter. But you got to lift and shift to get into your environment somehow, right?
[00:22:13] Speaker B: Yeah.
[00:22:13] Speaker A: But once you're in your environment, lifting and shifting is not that great of an idea anymore. So, so we've, I think, I think, well, we're learning that, you know, some people have learned it and understand that some people are still learning the hard way. But, but that's okay. Like we're getting there. My issue with ELT is the t.
I am phenomenally unhappy with the concept of data cleansing copies of data.
My philosophy is fix the effing source, otherwise you're wasting your time. And I don't want to hear any excuses. Yes, these source owners are hard to work with. Boohoo hoo. Project managers, boohoo executives, boohoo hoo. Now if the issue is you don't own the source, like you're working with peoplesoft data, for example, coming in, okay, Beat the living hell out of the, the owners of the data, right? Like why, if they've got data quality problems that are harming you, they've got data quality problems that are harming a bunch of other people too, right? So motivate them in some way to step up and fix the source that they own. Right? If you own, if your company owns the source, do your job and fix it, period. Right. I don't hear any excuses. I don't I have zero tolerance for the excuses. Now, if another organization owns the source, okay, fine, you gotta, you know, gotta do what you gotta do. But part of doing what you got to do is let's make it very uncomfortable for them to send you low quality data. Right. It's not. We, we as professionals, we've got to step up because if, you know, if, you know, we'll beat up on PeopleSoft. But, you know, if the PeopleSoft guys don't fix their data quality issues, then it's never going to get any better. Right.
So.
[00:23:57] Speaker B: And I think, you know, in, in Dan's Linstance new, you know, the CDVP data vault 2.1, the idea of that feedback loop and write back is like, at a minimum, we got to do that. We got to be able to make sure that those sources are getting fixed, even if in the interim we might not have to, you know, we might have to make some transformations on, on the data warehouse side in order to get the reports the business wants. And those are the things that Dan called soft business rules. Right? Yeah. And it's as far down the line as you can. Right. You know, we don't want to be, like you said, copying data and then changing it. And I think even in our seven pillars, it was ELT in the spirit of elt and what we really were trying to get at is it's mostly el. It's like get the data out of the source and into your data platform. And the very last thing you would do is any kind of transformations. And of course, you know, I'm such a huge advocate of the data vault methodology. It's like you only apply. The transformations are as far down the line as possible for auditability reasons and because people change their minds on how they want the data interpreted. And so the T is kind of a little T in my mind. Right. Yeah. The big team, the team you're talking about. Yeah, I'm with you. It's like, let's not be.
[00:25:14] Speaker A: You're stuck.
Yeah, you're stuck. Like, if you, if you don't own the data, then you're stuck. Right. I think, you know, one of the comments in the, in the chat is, you know, the. It's not a priority for these owners, so make it a priority. This is why I started beat the living hell out of them. Like, you know, it's, you know, we've got to step. When you, When I consult with organizations, you know, there's always some vendor that they're working with their data and they're always complaining about it and all the usual stuff. Right, well my question is, well, what are you doing? Like why can you put pressure on it? Like if we all just sort of say, yeah, yeah, you know, nothing we can do about that vendor, then yeah, then that vendor will never, ever, ever do the right thing. Right? Because they want to make money and they don't care. You know, they just don't. So yeah, we've got to, we've got to step up. So I think, you know, there's a culture change here. I think we need to be intolerant. We should be intolerant of poor quality data, period. But we've, you know, I think the first step is to be intolerant of poor quality data that you own and control. Like as an organization, it might not be your team, but your organization owns and controls it. You should be intolerant of that, intolerant of that poor quality stuff. And then once we're intolerant and then maybe if PeopleSoft was to adopt that culture internally within their own organization, maybe that would they change too. But, but we do need to put pressure on, on these other groups and hopefully it works out. So yeah, you know, it's a bit of theory, but it's what else is going to work? Right? We've got a, we, we need to step up, you know, minimally. We need to step up and do the job that we should be doing.
[00:26:57] Speaker B: So unfortunately running out of time. So I want to get to the question on AI. Since you've been recently studying AI and getting your Masters, what do you think the role of AI is for data engineering and data ops and pipelines and all of that?
[00:27:13] Speaker A: Yeah, so I think there's two roles. So the first issue is one of there's significantly better tools in the data space now because of AI technologies. So I think I'm seeing significant improvements in productivity or potential improvements. You know, you got, because you got to adopt the tool and all sorts of stuff. Right? But so I've seen significantly better tooling coming out in the last few years. It's, it's one of those things where like I think the data, the data community can step up and build their own tools now. So because of LLMs and, and other stuff. So I think there's some really good stuff going on there. But the second issue is one of, well, how can we, you know, as say as data professionals, what is our role in corporate AI in, in our, in like building AIs within our organization to, to serve our customers. And that boils down to, you know, it's a lot of data science stuff, but it, it really boils down to are we, are we providing high quality data in a continuous manner that, the, that that is needed by these AI models that we're building in order to better serve our customers? And without the data, you know, without this high quality data coming in at the rate it needs to come in at, we're out of luck. Right. And it's a garbage in, garbage out type of a, type of a thing. And so I think it's really, so there's, you know, the, we can use AI to make our lives better and we can, and we can make the AI existence better or the AI quality better by producing the higher quality data that it needs. So I think there's a, it's a really interesting space right now for the data for data professionals, I believe.
[00:29:04] Speaker B: Yeah, no, absolutely. And things are changing effectively daily now.
[00:29:08] Speaker A: Yeah, changing for the better.
Change isn't always as good as it sounds, but it's changed for the better, which I'm really excited about.
[00:29:18] Speaker B: Awesome. Yeah. So what's next for you? Events, meetups, classes where folks might find you.
[00:29:25] Speaker A: So I'm going to be speaking in Agile India at the end of March. That's always a great conference if you're based in Bangalore.
I co teach Data Vault workshop with Dan Linstead and, and Cindy Myerson on a regular basis.
That's advertised via Data Rebels. I basically teach the one day continuous data warehousing workshop with that. So that's a lot of fun. And yeah, you'll see me at. Hopefully you'll see the Continuous Data book come out sometime this fall if I reasonably stay on schedule. So, you know, it's all goodness.
[00:30:08] Speaker B: Yeah. All right. And the best way to connect with you is LinkedIn.
[00:30:12] Speaker A: Yeah, just do a, sir, you know LinkedIn, do a search on me. You know, go to scottambler.com or ambisoft.com or as well.
[00:30:19] Speaker B: And we've got the QR code there on the screen for folks that if you got your phone handy, you can just scan that will take you right to Scott's LinkedIn profile. Yeah, well, thanks Scott. I mean as I expected we would not even remotely cover half of the topics you and I could have been talking about in, in the time that we had so that I love having you on the show. You know, the, the insights that you have. I mean, you know, we've been, we've been crossing paths in the, the Data Vault world and the Data Vault Conference for probably a decade or more now.
Yeah, so it's thank you for the contributions you've made to the Agile world and now continuous data warehousing. I really do like that concept a lot better.
[00:31:01] Speaker A: Thank you.
[00:31:02] Speaker B: Yeah, I do like that. Thanks for everybody who's watching and joining us online. Doug, thanks for all the comments. It's great to see a comment thread going on the podcast for a change. Awesome to have that interaction everyone else again. Join me in two weeks. My guest is going to be industry analyst and friend of the Data Ops family, Sanjeev Mohan, so I should have an interesting talk with him about the state of Data Ops in the world today.
And as always, be sure to like the replays from today's show. Tell your friends about the true DataOps podcast and don't forget to go to truedataops.org subscribe to the podcast so you don't miss out on any of the future episodes. So until next time, this is Kent Graziano, the Data Warrior, signing off. For now.