Episode Transcript
[00:00:03] Speaker A: Hey, welcome to this episode of our show, true Data ops. I'm your host, Kent Graziano, the Data warrior. In each episode we try to bring you a podcast covering all things data ops with people that are making data ops happen today. If you've not yet done so, please be sure to go look up and subscribe to DataOps Live's YouTube channel because that's where you're going to find the for all of our past episodes. If you missed any of the prior episodes, you can go catch up there. And better yet, if you go to truedataops.org, you can subscribe to this podcast and get proactive notifications about our upcoming episodes. Now, our guest today is the author of how to succeed with Agile Business Intelligence. He's a consultant and a self proclaimed bi artist, Raphael Branger. Welcome, Raphael. Glad to have you here today.
[00:00:56] Speaker B: Hi Ken, glad to be here. Thanks for having me.
[00:01:00] Speaker A: Sure. So why don't you give our audience a little bit about your background in data management and your journey with Agile bi?
[00:01:08] Speaker B: Exactly. Sure.
I'm in the data space, I would say since more than 20 years. I started as 19 years old reports developer back then with Crystal reports, moved on, then more into the bi platform engineering and from there into let's say, bi requirements. And especially this topic about bi specific requirements led me to agile Bi because I always felt that somehow people are struggling formulating clear requirements and that if they, let's say, just write down the requirements on paper without seeing the data very quickly, often they already have changed their mind until we implemented the solution. And yeah, now today I'm working mostly as an agile bi coach on one end and from a technical perspective as a solution architect.
[00:02:07] Speaker A: Okay. Yeah, and I was just going to ask you what you do there at it logic. So that's your consultant. And Agile Bi coach was good. There's not a lot of folks who know how to do that. My experiences in trying to implement agile in the data warehousing and bi space was challenging for sure. Back when I first started doing it about 20 years ago.
Discovered very quickly that an experienced scrum master couldn't necessarily help a agile data warehousing team.
They didn't understand the roles and the different skill sets that are in a data warehousing and a bi team compared to say, somebody developing an app with Java. Right. You've got a team of ten Java developers and you can interchange them for whatever task needs to be done and you just can't do that with most data warehousing and bi teams.
[00:03:06] Speaker B: I think that's an important aspect. That's why I'm emphasizing like bi specific requirements or agile Bi, you cannot simply copy or apply a practice which works in software engineering and think that it's working exactly the same way in a bi or data centric area, but still, you can, of course, adapt. This was basically what led to writing the book where I tried to, let's say, bring all this experience together and just to make that sure, I'm continuously learning. So the book was just an intermediate state and already having, let's say, few points here and there, I say, okay, next time I will add this topic or chapter because I've already learned quite a few new things during the past even week.
[00:04:01] Speaker A: Yeah, because I'll say at the high level, when you look at the agile manifesto and the principles of agile, the concepts are there, but when you get down to the practical implementation aspects, there is quite a bit of difference between developing, I'll say, like a SaaS software solution versus an integrated data warehouse and bi applications, even though, yeah, it is all technically software, but it's a very different mindset and approach that's necessary in order to be successful with that.
[00:04:42] Speaker B: Absolutely.
[00:04:43] Speaker A: Yeah. You mentioned users changing their minds, and that was always one of the challenges with building a data warehouse, that if you started off with a set of questions that you got from the users and you designed a schema specifically to answer those questions, by the time you delivered it, there were more questions and your schema couldn't answer it because things had changed.
[00:05:08] Speaker B: Exactly.
[00:05:10] Speaker A: And then there's the whole, sometimes the users don't know what questions they want to ask until they actually see the data.
[00:05:19] Speaker B: Exactly.
[00:05:20] Speaker A: And I think that's where what led me down the data vault path years ago was that having a data vault underpinning it, which was really kind of, I'll say, solution agnostic and source system agnostic, then you could build your data marts on it.
As things changed, it was a lot easier to evolve the bi interface aspect of it instead of having to rebuild the entire data warehouse because, oh, wait a minute, we forgot this data that we need for a dimension that we didn't even know was there.
[00:05:50] Speaker B: Right? Yeah, exactly.
[00:05:52] Speaker A: Cool. So as you know, data products has become one of those hot industry buzzwords right now. So I wanted to get your take on that concept. And what does that mean? How do you think that applies in this data landscape we have, and in particular to your work with Bi and agile. Bi.
[00:06:13] Speaker B: So looking back and talking about bi requirements, I'm using this term of a data product since ten years already, because I was always too lazy to repeat, okay, report, cockpit, ad hoc, report, dashboard, whatever. So even though it's now used in an even broader sense, like data interfaces or providing data as a product, I think first of all, for me, data product means just the outcome or the visible or tangible outcomes of a data platform, for example. And what I most likely, or the thing I like most in the case of a data product, is really the analogy to a real product in the real world.
Because I think people, especially from a business perspective, they often struggle understanding what does it actually takes to get data, let's say, being refined from a source system or the source, where the data is somehow created up and into a report or dashboard, you name it. And there I use, let's say as well in the book, I continued this analogy of a data product to say, okay, if there is a product, we need a data production line like in a real plant. And I think as well, this concept of having a product fits perfectly in because you have as well in a real plant. First of all, when you start designing a new product, you don't use an assembly line already for the first piece, but that's kind of a prototype or just a pre series of a product, where you simply want to try whether this works in the end, where you can double check with pilot users whether they like it, whether it fulfills their expectations, and then another, let's say, stage comes in. If you want to like a car manufacturer put out, let's say, 5000 vehicles a week. You can't fabricate or produce every car individually. You need to, let's say, industrialize the production of it. And I think that's the same for data products.
Some data products are ad hoc products in the beginning, but if you want to scale, then I think you, let's say, can apply similar principles like in a real plant to optimize your production process.
[00:08:49] Speaker A: Have you read anything on the theory of constraints?
[00:08:52] Speaker B: Sure.
[00:08:53] Speaker A: Okay. I was going to say, because when you said that, that just took me back to my very first agile data warehousing effort in the early 2000s, is an Oracle dBA at a conference had recommended. We read this book, the goal, which I assume you read that, and that gave me that whole image of a production line and eliminating bottlenecks in order to deliver it, was before I ever read anything, I barely read anything about scrum or anything back then, and found that that metaphor really worked well with our team and we got so that we would be able to really productionalize the changes to the data warehouse based on user input and customer input, and it proved to be very successful adopting that particular approach.
I don't think I hear enough people talking about that and I think that's a great point to bring in, talking about data products like you said, treat it like what, a physical product that was built in a factory, how does that happen and how do you scale that? And your description is, I think right on for what we should be thinking about as we try to move into this concept in the data world. I'm glad to hear that you hit it ten years ago, that's great.
Before data mesh made it a popular.
[00:10:21] Speaker B: Yeah, exactly.
I also mentioned theory of constraints in the book because what you mentioned right in the beginning was like, okay, if you have the team with the ten Java developers, you can distribute tasks pretty easily. And of course it would be nice if we have data teams which are purely, let's say, not only cross functional as a whole team, but that you have, as Scott Ambler calls it, a generalizing specialist. And this generalizing specialist, it would be the ideal. So somebody who can, let's say work along the whole assembly line, let's say gathering the requirements, gathering the source data, then do some kind of data modeling, data loading, data testing, even designing the report and hand over the final product to the consumer. So that's of course the ideal case because you can reduce, let's say the overhead of handing over tasks from one, let's say role or one guy to the other. But the reality still is pretty much different. So probably you have, let's say in your team, a business analyst looking at the requirements first, then you talk to some technical guys, oh, how could we connect to this and this data source? Then you have the data modeling guide, then you have the data visualization specialist, and maybe you have a tester. And now the big question is how can we organize this, let's say kind of specialized oriented organization? And there I think the idea of having kind of an assembly line is pretty straightforward or interesting. For me it was like aha moment, say okay, if we for example, establish continuous flow with a certain rhythm of handing over little increments from, okay, getting the data from the source, loading it somewhere else, doing some modeling, then handing it over to the next station, even though it's not, let's say the ideal world, like we would love to have it with the generalizing specialist, it's still a very good way to organize the work within a team and being able to deliver, let's say, end to end increments, as I call it, within a very short amount of time, like days or weeks.
[00:12:50] Speaker A: Yeah. I think that was one of the keys that I ran into in learning more and more about agile is that people got hung up on this. We got to deliver something in two weeks. But I ran into teams that were not doing the end to end iteration. It's like, yeah, they were generating a lot of stuff every two weeks, but there was no solution. Right.
I literally had one customer that was so proud of themselves because they'd done, I don't know, probably 22 week increments. And all they'd managed to do was populate stage tables.
[00:13:28] Speaker B: Yes.
[00:13:29] Speaker A: Right.
They were going to do data vault. They hadn't even started building the hubs and the links. And they were so to say, oh, we're agile. It's like, well, no, you're not really agile.
It's great that you're waking it up this way, but you haven't actually delivered a solution. So I think that's important, that whole end to end, because an assembly line doesn't stop at the first station. You don't have all the tires. Make it to station one and get put on the rims. Now you just have a stack of tires on rims, but there's still no cars at the other end of the assembly line. It's not that helpful unless you are just a factory that's making tires. But if you're making a. Yeah, that's not that helpful.
You mentioned Scott Ambler, so I assume you're referring to his disciplined, agile approach.
[00:14:14] Speaker B: Yeah, exactly.
Just one word in terms of this topic about working, let's say even though you work in one or two week iterations, just building, let's say, all the table or a lot of tables in the stage layer, et cetera.
If you explain that to someone, even a savvy developer, it's simply more abstract if you just show them an architectural diagram. And then, of course, I had so far, before I wrote the book, a lot of PowerPoint slides and architectural slides, et cetera. And somehow I couldn't get through with the message. Now, since I use, even in a visual way, the data production line analogy, I ask people, okay, we are now building the whole plan. So within our plan, we will have multiple production lines. Now, of course, we could decide, and the production line consists of several machines or workstations, et cetera. Now, of course, we could follow the approach to always build only the first machine for all the production lines in two weeks, for example, or one month's time frame and then we continue with the next layer or next machine. And then of course people say, but that doesn't make sense because the production line isn't finished. So wouldn't it make more sense to have one production line finished as a whole and then let it run to already deliver value by delivering data products, let's say, always speaking in this image language, and then they immediately understand it. And I think that's as well one way, which is very helpful to explain a team, but as well stakeholders and architects around the team, that it's not only, as you said, it's not only about having a two weeks iteration plan, it's really having this pivotal mindset or idea that it's feasible to create an end to end increment in an amount of time. But as you as well said, it's quite tricky.
[00:16:24] Speaker A: Yeah.
Thinking about the assembly line approach. So what's your perspective on data ops processes and how that all fits into this picture?
[00:16:36] Speaker B: So first of all, it's exactly when we're talking about building a plant and not just having a talk or explorative analysis where I just do a little bit of python code here or an Excel spreadsheet there, I think it's exactly data Ops is incorporating that. We have this kind of institutionalized process so for example, that data testing is not just, let's say, being done by accident.
I think that's one thing. And the other thing is let's stick to the image of having the assembly line. It's not done with just building the assembly line. The assembly line itself doesn't provide any value. You need to run the assembly line, so you need to, let's say, let the data flow through the line on a daily basis or even a more real time case to produce, meaning refreshing the data product. So providing new, more current data day by day or hour by hour. So if you go into a real plant, and I had the chance to do such with many of my customers, even though you don't need that many people to run such a production line nowadays because everything is automated still, you need some human beings looking here and there if something goes wrong and then adjust a little bit here and there, or even just do some quality checks in between. And I think this is exactly the idea of having a data ops process which does not only look about how a solution is being built, but as well how it is operated. And especially having data as our raw material, it's inherently dynamic. So every day, even though it's the same data interface, it's the same source system, data can change just very quickly and then suddenly maybe something breaks or at least doesn't behave in an expected way. I think combining development and operations, ideally within the same team, so that the people who build the solution also have to run it. I think that's a perfect fit.
[00:18:54] Speaker A: And I think to a certain extent you have to build things with the mindset that it has to be continuously operational.
[00:19:03] Speaker B: Absolutely.
[00:19:06] Speaker A: So is that part of a disciplined, agile approach then?
[00:19:10] Speaker B: So disciplined agile, definitely. If I had a look at that, writing the book as well, definitely has, let's say, DevOps as a generic term where they describe it. And one reason why, back maybe ten years ago, when we first met Scott Ambler, what we liked about disciplined agile was that it's not as prescriptive in a way that it says okay, you only have an iteration based approach. No, it says okay, we have an iteration based approach as one option. But there is another option, like a flow based approach. Or if it comes to a more explorative thing, especially when it's about prototyping, for example, then it's not about, let's say, having prototyping stories, which we check every two weeks. No, it's maybe having a two hour cycle where we constantly or multiple times a day change our opinion, try to experiment with different hypotheses, et cetera.
That is basically what I personally like to have, let's say some guidance in choosing the right option for the context I'm currently in.
[00:20:28] Speaker A: Yeah. So what do you think about the role of automation in all of this? And maybe it's AI driven automation in the near future? I assume so.
[00:20:41] Speaker B: For me, automation is really key because this is what I mentioned before, even though the idea is clear. Okay, let's build the whole production line and then move on to the next production line.
It's quite challenging, because again, we do not just, let's say, build a piece of software, we have a multilayered system. And especially we are not at the beginning of the, let's say, data creation process. We are always with the bi system, we are always somehow part of an existing landscape, and we as well have things which we cannot control or influence directly. And doing, let's say, such an end to end increment within, let's say, two weeks, or even just one week as an iteration length is really cumbersome. And if you really think about a bit more sophisticated data processing task, like maybe a little bit of historization here, or delete detection, incremental loading, et cetera, depending on the requirements, of course. But if you would always, let's say, code everything on your own for every table you need, then of course, even though you do a very thin slice of requirement, I still would doubt that you really can do that in one week or two weeks. And that's the same, let's say, for manufacturing plants.
They only can produce a significant output at, let's say, constant quality level by automating processes. And this is exactly the same in data ops or in agile bi as a precondition to have short iterations.
You should think about automating whatever you can. And in the book we describe, or we use another analogy besides the data production line, it's the agile bi city. So we draw, or a marketeer had the idea of drawing a city map with different districts, which as well then reflect to certain degree chapters in the book. And one district we visit on this journey is central technology. And there we describe mostly just different categories of automation, like data warehouse automation or ETL or ELT automation, but as well data test automation, deployment automation, or as well known as CI CD processes. And last but not least, that's often forgotten. It's visualization automation. So even when creating a report or a visualization like charts and tables, you can apply basically a high degree of automating things and not, let's say, positioning every pixel individually.
[00:23:28] Speaker A: Do you think it's possible for organizations to actually deliver this kind of thing at scale if they're not adopting some sort of an agile data ops and automation approach to delivering?
[00:23:47] Speaker B: First of all, I cannot prove that it's not possible, but based on my experience, I have never seen it. In contrast, the trigger in my career, when I was a rather still young consultant, was a large waterfallish project, and the team I was involved, I think the guys who wrote all the concepts, they wrote more than nine months, just the business concept and a technical concept and even a more detailed concept, et cetera. Then we implemented twelve months, et cetera. And literally when the system went live after three months, the customer shift deleted the whole system because it didn't satisfy the business requirements anymore. 24 months later even, I mean, management changed. And they said, okay, let's do everything different now, because whatever reason, and the other thing by the way, was like that as well. The IT operations team, which was separated, not being, let's say, the same guys as the developers, they simply said, hey, we cannot operate this. It's much more complex for us and the skill sets we have, et cetera. So I'm coming from these, let's say, bad experiences of trying to, let's say, have this kind of ploido safety like that. We can describe everything, et cetera, and if we're talking about companies, but as well, data at scale, you are just multiplying, maybe even exponentially multiplying, the problems of having, let's say, data as a dynamic raw material which can change immediately, or all the inconsistencies you have in your systems, et cetera, because even the processes are not consistent in a largest company. And that's why I come to the conclusion the variability and the uncertainty you have in these environments are even bigger than in small ones, and therefore even more. You have a need for an adaptive kind of process where you have, let's say, as well, this learning aspect in it, so that people can learn about the data they have at hand now, that they can learn about the requirements they have today, and then as well admit that, oh, the first version, how we initially thought what we need was completely wrong, because now that we have seen a first result, we learned what we actually need. And it's just a pity if you have already spent all your project budget just to find that out and not being able to, let's say, change the direction, even just slightly into another direction. So from that point of view, I'm pretty much sure that you have a lot of advantages following an agile approach in this context.
[00:26:45] Speaker A: Yeah. So let's shift just a little talk about the seven pillars of true data ops. Wanted to get your feedback on. Is there one of those that you think is kind of critical to success in trying to do these things?
[00:26:59] Speaker B: I mean, all of these seven pillars are somehow important, but I think if I have to choose one, it's definitely the automated data testing and monitoring. And I tell you why, of course you need, let's say the pillar, like, let's say, getting this data somehow into your system and maybe as well, combining the data. So that's basic thing. But even if you think about pure, let's say, solutions now being built, like with a tool like power bi, you even don't need this for a small sized solution. But the most valuable asset you always have in a data driven solution is the trust of end users in your data. Because if people don't trust the data they see in their report, let's say the effect is very quick, that they move away and go back to their own excel spreadsheet. Not that the quality is necessarily better there, but at least they know themselves how bad the data is or where the flaws in the data are now, if we are providing data products, let's say from this industrialized way where as well we can, let's say, change or adapt to changing requirements, we really want to make sure that whenever we change something, that existing assets or existing data products are not negatively influenced by these changes. That's why I say before you start even, let's say, think about buying a data warehouse automation solution, which is important, but rather start with getting a data test automation solution because that's your safety net in the end. And make sure that you can deliver high quality from day one on. Or at least make sure that you know that there is an issue in the data before your end users will find out.
Okay, good.
[00:29:01] Speaker A: Wow, we're like at 29 minutes already. Wow.
I have to skip to what's coming up next for you. Are you going to be speaking at any conferences or meetups? Yes, couple of months.
[00:29:19] Speaker B: So for the spontaneous ones, tomorrow evening I'm in Zurich for the Twi bar camp or data analytics bar camp. So feel free to Google that.
Exactly. In Switzerland, Europe, not Sweden. And then as an online event, I'll join Vasco Duart, who is the publisher of our book as well for the product owner summit. Very big recommendation to that as well. And then later in June, I'll attend twi Europe conference in Munich, hosting an agile bi clinic where people can bring their own use cases and get some coaching from my two fellow agile coaches, Jana Nezer and Andrea Weichant and myself.
[00:30:11] Speaker A: Great. And then I see we've got your LinkedIn profile here, so people who want to connect with you and get more information about those, they can connect with you on LinkedIn and see what you're doing.
[00:30:25] Speaker B: Yeah, it would be a pleasure.
[00:30:27] Speaker A: Awesome. Well, thanks for being my guest today, Raphael. This was very interesting talk. Glad to find another somebody else who's studied the theory of constraints and applied that to delivering data products. Now we're going to say it was like it was data products. It's product thinking. Absolutely.
I want to thank everyone else for listening in today. I want to make sure that you join us again in two weeks. My guest will be the principal data strategist from Snowflake, Dr. Jennifer Bellason. And as always, be sure to like the replays on today's show and tell your friends about the True Data Ops podcast. Don't forget to go to truedataops.org and subscribe to the podcast so you don't miss these upcoming episodes. So until next time, this Kent Graziano, the data warrior, signing off for now.