Episode 48

March 12, 2025

00:34:34

Sanjeev Mohan - #TrueDataOps Podcast Ep. 48

Hosted by

Kent Graziano
Sanjeev Mohan - #TrueDataOps Podcast Ep. 48
#TrueDataOps
Sanjeev Mohan - #TrueDataOps Podcast Ep. 48

Mar 12 2025 | 00:34:34

/

Show Notes

A former Research VP, Data & Analytics for Gartner, Sanjeev was bitten by the data bug 30 years ago, and it's never let go. It started with an accidental introduction to dBASE III+ and years of being an Oracle evangelist, and he's now deep into Big Data and the plethora of technologies that complement the Data Science space.

In between, he's helped the world's largest companies select the right cloud platforms, has run large mission-critical projects, and assisted senior management executives in making tough strategic decisions to ensure optimal solutions to complex problems such as the Wells Fargo / Wachovia merger.

On top of all that, Sanjeev has published many white papers and has been among the top 10 speakers at numerous conferences such as Oracle World, BI Brain Trust and Informatica World. He created University of California, Berkeley Extension's advanced Oracle training curriculum.

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Foreign welcome to this episode of our show, True DataOps. I'm your host, Kent Graziano, the data Warrior. In each of the episodes, we try to bring you a podcast discussing the world of data ops with people that are making dataOps what it is today. So be sure to look up and subscribe to the DataOps Live YouTube channel, because that's where you're going to find all the recordings from our past episodes. If you missed any of the prior episodes, that's where you can get caught up. Better yet, if you go to truedataops.org right there, you can subscribe to the podcast and then you'll get notifications of all of our upcoming episodes. So today my guest is author, podcast host himself, analyst and thought leader, Sanjeev Mohan. Welcome back to the show, Sanjeev. I think it was the second or third time I think I've had you on. [00:00:52] Speaker B: Yes, thank you, Kent. It's always a pleasure to join you on your show. [00:00:59] Speaker A: So for folks who don't know you, why don't you give us a little bit of your background? [00:01:05] Speaker B: Thanks, Kent. We riffed about some of my past history quite a bit actually on your show. I remember we were talking about good old days or early days of data when I joined Oracle. This was in early 1990s. I've actually been in the space of data my entire career. Started with Oracle, went into consulting, then the dot com boom happened. So I got into E commerce, e business space. But once again, although I was in the explosion of World Wide Web and the Internet, it was mainly to build data architectures and data models for websites. And then I took a hiatus. I got into management consulting because it sounded cool. Booz Allen Hamilton is where I ended up. And all of a sudden I was no longer doing data models. I was doing PowerPoint tax. And that was very, very much of a culture shock for me. But it taught me this skill of bottoms up, telling a story of business process through a data model and then top down, telling the story to an executive. So I took that skill of top down and bottom up and I joined Gartner. Worked there for almost five years. And again I was in the data space. Super exciting times. I remember in 2017 I got a call from a client in you in somewhere in Europe. And the person called and said, we are calling Gartner because we want advice on how to do gdpr. And my response was, I'm sorry, GD What? [00:02:56] Speaker A: Oh no. [00:02:58] Speaker B: Because it was so brand new. [00:03:00] Speaker A: Yeah. [00:03:02] Speaker B: So this is how then I got into data governance. So I took my, my database management system skills, added data governance, data science, and then in 2021 I left to start my own company called Sanchmo and become an independent analyst. And lo and behold, generative AI happened. And so I wanted to be on the cutting edge because I'm a sucker for pain, I guess. Long nights, long research. People tell me, by the way, Ken, this is funny. They say now that CHAT GPT is out and it can help you write amazing content. Why do we need analysts to write content? And I'm like, because I am writing stuff that ChatGPT does not know. I'm writing about trends, I'm writing about trying to connect the dots, what's coming. And LLMs have not been trained on stuff that doesn't exist. So yes, there is still place for us analysts. [00:04:09] Speaker A: That's good. Yeah. Job security still there? [00:04:13] Speaker B: Yes. Actually job security is getting even better because there's so much noise in the market that companies need some organizations that have a full time day job of running the business. They don't have time to research and figure out where should we invest our limited dollars. [00:04:34] Speaker A: Yeah. And I think there's, there's been enough, you know, mistakes and hallucinations out of AI at this point that. Do you really want to bet the future of your company on asking ChatGPT what we should be doing next without at least checking with someone like you and saying, okay, this, this is what we think we're hearing. Is this right? Right, Correct. Yeah. So that's the, the, well, quality assurance. Right. That's, that's the QA part now of using AI is I guess as our job security. The humans are now going to be the QA instead of the other way around. Right. Instead of we have writing programs to QA the data, we're now going to be the people QA the, the results coming out of it. [00:05:22] Speaker B: I love it. Yeah. Yeah. So in some ways, you know, AI is really supercharging us. It's making an individual contributor. Actually this is what Guy Adams, the co founder of Data Life, says, every developer now is a team lead with 10 subject matter experts working for that individual contributor who's now a team leader. [00:05:48] Speaker A: Yeah, Here you go. [00:05:49] Speaker B: One person is doing documentation, one person is writing a unit test, one person is doing some regression analysis and you the sole data. The developer is actually a team leader managing this army of assistants. How cool is that? I think if we have that mindset then AI becomes a helper and not somebody out there to take our jobs away. [00:06:18] Speaker A: Yeah, exactly. Yeah. And I think that's great. So as you know, this season we've been kind of looking back because it's been, you know, a few years, it's like four or five years now since we came up with these ideas around true data ops and wrote the book, you know, Data Ops for Dummies and all the things that Guy and you and I and, and our friend Justin had been involved in. I want to start off with all of this. Let's start with how do you define DataOps today? [00:06:57] Speaker B: There are many ways of defining data ops. There's of course the seven pillars that shows in a very granular manner you do cicd, environment, automation, observability. I'm not going to go into each of those seven because we have a link, we can go read them. When people ask me what is DataOps? Is it something new? Should I worry about it? I've seen it being defined as agile development for data and I don't understand, I'm a data person, I don't understand agile. I'm not a software developer. So help me Understand what is DataOps? So the way I explain it is there is always the what, the how, the why, the who of everything we do. So what we do in data space is that we try to derive some, make decisions, derive some intelligence out of all this massive amount of data that we've been collecting for our organization. So that's a what. And what we are doing is we these days, we are building more and more data products. Data products help make data more consumable. The buzzword is a democratizes data, but that's what organizations want, how they get to it is called data management and that includes all the etl, the pipelines, governance, data quality, all of that. So data management is how you do it. Data ops is how well you do it. So you can build a data pipeline every day if you want, but if you're repeating that task and you're doing it from ground up every single time manually, then there are limits to how much you can scale. So how well you do it is DataOps. It brings in this whole process of let's automate what we can automate because the more manual tasks you have, the more places things can break. So put auditing, put monitoring, notifications, that is observability, automate the testing we talked about documentation, unit, test cases, all of those things. So how well you manage your data so you can deliver data products is DataOps. This is a simplistic English level definition rather than a more technical breakdown of the components of DataOps. [00:09:48] Speaker A: Yeah, like you said, you can look at the seven pillars and that kind of gives you the, these are the things you should be doing in the data ops world. But yeah, it's, it's really about certainly empowering the organization and the folks who work for the organization to be more effective in how they are managing the data, building and deploying data products and keeping track of it and being flexible and able to evolve with the organization, the organizational needs and the changing business requirements and actually managing that well. [00:10:27] Speaker B: Yeah, absolutely. So a practical application of DataOps is in this example, a new person joins the organization in let's say data engineering department. In the past this person would have to read manuals, go talk to people, try to, to figure out how to write the code for the task that has been assigned. Today what we get out of a good data ops environment is that this new person becomes productive literally within days of joining an organization. How first of all they can discover there's some catalog, they can discover what has already been done so they don't have to start from scratch. Now they decide that, okay, so let's say I've been tasked to determine what kind of customer churn I'm getting for a certain skew. For that I need to write some SQL code. But how do I do it? I need to know what tables, what environment and all with DataOps I can do a single click and an environment is set up for me. 0 copy clone. For example, if you're using Snowflake, I get a copy of a read only copy of data, then using some other pieces that I'm sure we will talk about. I don't even have to write the SQL from scratch. I can say use this existing work that's been done for a different department and then using some natural language interface, I'm very quickly onboarded into the system. And so all that time consuming pieces of infrastructure management and setup, environment setup and all can be automatically handled. [00:12:30] Speaker A: Yeah. So basically what DBAs used to have to do and spent nights and weekends because they'd run out of space and all these other things would happen. It's all automatable now and we've got I guess templates like even what you're talking about. A new data engineer comes onto the organization, there's templates, right. And doesn't have to write it from scratch and the engineer can now generate that data pipeline and hopefully they've got a test suite. Right. And they automate some of the testing instead of having to, you know, figure out well, one, what, what pipeline tool are we using? Well, it's in your Data Ops environment. This is, this is what we use. Here's a template for doing what you want to do. You know, you, you, like you said, maybe hopefully at some point some of the tools you can use a natural language interface to say, I need to do, I want, need to build a churn model using X, Y and Z and it generates code and then you can build the test suite. And so they spend more time doing, I'll say, the thinking, right? It's doing the thinking rather than doing the coding and risking, well, I don't know this particular tool. Or say maybe it's Python. I've never used Python before. So now I'm having to do a lot of research to figure out, well, how do I do this in Python on Snowflake that that completely goes away, right? So the learning curve is way lower and, and the rate at which someone can be productive is much higher. [00:14:07] Speaker B: You know, an example just came to my mind. I remember when in my early part of career when companies had to do a lot of graphics and they needed some material developed, they would go to graphic artists. Artists would be very like, if they were very busy because they were in demand, they would say, yep, I'll come back to you in three weeks. And then Adobe Photoshop came out. All of a sudden people were like, okay, why do I need graphic artists? But you do need graphic artists. I mean, the only difference was that Instead of taking three weeks now, you could take only three days. So the graphic artist, and instead of costing $50,000 now you could do it for 5,000. So the companies benefited, but the graphic artists could now, instead of spending three weeks on one project, could do 30 projects. In that time frame, the barrier to entry lowered considerably more people could come in. And to your point, we were spending more time thinking about what would be a very creative graphic to represent my client. Rather than learning the, the tool or just trying to glitch together different things, you could actually focus on the idea and the business imperative strategy and let the tool just automate and do all the backend work for us. [00:15:50] Speaker A: Recently, as you know, ISG came out with this new set of DataOps buyer's guides. So wanted to spend a couple of minutes just getting your take on the new buyer's guides and what that tells us about the Data Ops space and how it's changed in the last couple of years. [00:16:08] Speaker B: The first thing that struck me when I saw the ISG Bias guide and the process they went through, I was amazed that the initial list of companies who play in the data space somewhat, if not across all the capabilities, that number is 49 companies. So that is a phenomenal number. And that to me shows the pent up demand of vendors wanting to automate some or all of Data Ops pieces. Eventually they pared it down that list because ISG has a very strict inclusion criteria. They have five different areas like data products, data observability. So basically they need to make sure that of these 49 companies, there were companies that met their inclusion criteria. Even then, that list ballooned to 17. So 17 companies were ranked across five different categories. DataOps was one of the three companies that actually showed up in all the five categories and it's a leader in almost all of them. So that just tells me how critical this bias guide is. By the way, Gartner also has a market guide and I'm pretty sure other vendors are also looking at creating their own version of a buyer's guide or a market guide because this space is in demand. And I have some thoughts, but I won't hold it for now of where I see this space going, but I'll hold it for now. [00:18:05] Speaker A: Okay. All right. So I guess that just does show that the space has evolved because even a couple years ago it was Ventana Research which got acquired by ISG. So they did did one of these. And DataOps Live didn't meet the criteria the last time they did this. And so things have evolved that much that not only did DataOps live meet the criteria for evaluation, this time, they ended up in the top lip. Top of the list. Yeah, along with some very other well known names. Much more well known than we were. [00:18:42] Speaker B: Yes, correct. So I'm not sure if you want to mention, but it's up to you. Actually, I was at an event last week called Gartner Data and Analytics Summit in Orlando, Kent. One of my biggest goals was to go to this event and actually talk to the end users. Because I talk to my fellow analysts all the time and I talk to vendors even more. But end users who I don't talk to. I was literally surprised at how stable. In fact, I even published my blog this week on my Medium blog site called Sancho Medium. And I wrote that data governance, data observability, data ops areas that were facing a little bit of headwinds the last few years, but seem to have stabilized. And when I talk to the clients, they tell me we are amazed how far these companies have come in literally one year. One year ago we Did. Did a poc. And we were like, oh, these companies are not ready this year. We are surprised at how quickly they have grasped what does a business want to be more productive? And they've incorporated those requirements. [00:19:59] Speaker A: Yeah, and it's. It's interesting because one of the very early presentations back probably, I can't remember, was slightly before COVID or during COVID Justin and I did. We were given this talk all the time, balancing agility and governance with data ops. And so, yes, that was. That was, you know, five years ago, basically. And now to see where we are that, you know, it's good to hear that the folks you're talking to are seeing that. Yeah, that the space has evolved, that the vendors basically are listening to the business and helping enable them to do what they really need to do in this space. So anything else really stand out from last week's Gartner Summit? [00:20:48] Speaker B: Actually, I'm eager to actually jump into something. I will add to what you just said. So you said balancing agility and governance. I think there's a third piece that has been added and I'm just literally thinking on my feet here. That's control. It's balancing agility. How fast you can develop governance in a secure and a trusted manner. Then control is. How much control do you have on your data and its outcomes? Why I'm adding control into this equation is because one thing that came out over and over in my conference last week was this whole concept of lake house. [00:21:40] Speaker A: Yeah. [00:21:41] Speaker B: So if we look into what Snowflake has done, and Snowflake has been by far the runaway success in what they call data cloud. But in essence, it's a managed cloud data warehouse. So Snowflake manages a lot of underlying pieces for us. What is happening a lot in the market today, and Snowflake is reacting to that. Amazon, Databricks, Google Cloud, Microsoft is this whole concept of fabric, a data fabric, or sometimes if we go level down, it's Lakehouse, where the businesses are saying that we want more control over our data and we want to use open standards. Apache, Iceberg and Delta, Lake, Hudi have been some of the common open table formats. But what organizations are saying is that cloud data warehouses serve a brilliant purpose. If I'm a cfo, I want to build a dashboard. I don't want to muck around with my underlying parquet files. Let Snowflake handle that. But if I'm getting a lot of streaming data and I'm doing a lot of exploration and I don't know exactly what shape or form this will end in. I want to join my historical data with some of this data. And I may want to use more than just Snowflake. I may want to use Spark, I may want to use Flink or some other, some other technology. In that case I need, in that case my overhead is going to go up because with Control I'm, I'm actually having to deal with data at a much lower level than what I did with Snowflake. For that, DataOps becomes hugely critical. For example, if I'm going to create a data product, I want this ability to take my original data, branch it into a new repository so I can do my experimentation and then merge it back into the main branch or the main tree this way to make a version copy of my data experiment, build my outcomes, do automated testing. What I'm saying is that I think the space of DataOps is going to explode into this cloud data ecosystem, if I may, which is beyond just a cloud data warehouse. It is a much more complicated way where customers have more control but they need more management now because it's a little bit risky to be working on actual low level data than it was through a warehouse. [00:24:52] Speaker A: Well, and so you need a level of governance on that as well. [00:24:56] Speaker B: Yes. [00:24:56] Speaker A: Right. That reaches across the ecosystem and it's. Yeah, you see it even in Snowflake where now you can use, you can attach to Apache, Iceberg and even Delta Lake files free from within Snowflake. But yeah, it's that, that whole idea of being able to go outside the boundaries to do the exploration and what you just described is like, well, that still needs to be tracked, right? Yes, because if you get some good results, you have to like any good experiment, you got to be able to repeat it. And so if you don't have the documentation and the version control really on what you did, then how do you go back and scale it and productionalize it? Right. And that is data ops. Right. Regardless of the, the framework or the tool or the, the platform you're working with, those are the concepts that we need in order to be successful in that area. [00:25:50] Speaker B: That's very true. [00:25:51] Speaker A: So control, agility, governance and control. [00:25:55] Speaker B: Yes. And by the way, control is new because we had ceded control to hyperscalers in the past with a fully managed thing by the way, for some use cases, it's amazing. Fully managed, serverless, it reduces my cost, it reduces the cost for providers through multi tenancy and all of that. It's amazing. But then we are also getting into the generative AI space. We now have Far more analytical tools than we used to in the past. So we need to be cognizant that bringing the right compute engine to the right use case is probably the best strategy for the future. Spark may not be the right technology. Maybe it's Pandas, or maybe it is Ray which is getting traction for some machine learning. If I'm doing feature engineering, feature stores, then I need a certain compute engine. If I'm doing etl, then I have a choice of them. Some are open source, some are proprietary. If I'm doing real time analytics versus dashboards and reports versus batch analytics, for one, SQL may be good, but for another, Python may be good. In this scenario we see the role of dataOps becomes phenomenally more important because if we don't, then we will lose control over this complex environment. [00:27:34] Speaker A: Yeah, well, you just answered one of my underlying questions that I have there, everywhere, all the time. Is Data Ops more important now than it was in the past? And yes, the answer is yes. Quick, where do you see DataOps going next? [00:27:49] Speaker B: So the most exciting space, like all other areas, is in this whole natural language interface and it's literally a progression of everything. We are talking, we talked about when you and I started our career, we were still dabbling with some assembly code, but then we started using 3 gl language, 4 gl language. We've literally abstracted away a lot of complexity. Computer speak, geek speak. Now with natural language interface, it's the same progression. We are not really on a totally different track. I've seen some of the capabilities that have come out of even DataOps, like where I can ask the system to create me a data product. It does a fairly decent job. There's some quirks with ui, but it's getting better every day. We have a new model that comes out, a large language model that beats the capabilities. What was unimaginable six weeks ago is certainly possible through some of these agents that are running things more autonomously. I'm very upbeat on the addition of generative AI features into Data Ops space and just taking it to a different level where I'm literally giving natural language commands and having it write my SQL. Now a lot of people who are listening may be like, yeah, but is this SQL accurate? The way I look at it is that if, if I, if I can have it take care of 60 to 80% of my job, I'm still 60 to 80% more productive, you know, so it, and it's just matter of time before, you know, we know that this, that it will do the job. That is expected to do just like. And actually, once again, I'm channeling what guy says. We never question a compiler. We write something in a language of choice. Maybe it's Rust or C or C. We compile a C program. We don't go and check to see if the machine code, assembly code that was generated is correct. It's unthinkable. Right. So today we may question the efficacy of generative AI, but in a few years it'll be a moot point. [00:30:41] Speaker A: Yeah, yeah, I think you're right on that. And especially when it comes to the generating the SQL. The thing I go back to is the generations that I've seen and what it's able to do writing SQL and joins, and whether it's an inner join, an outer join, applying all sorts of transformations. That stuff works great as long as you have a good underlying data model. If you haven't done the data management part right. Yeah. Will you get the right answers? You'll only get the right answers if the AI engine actually has decent inputs, which in this case means a good data model. Right. They're not going to end up with a Cartesian product because you didn't, you don't have the right cardinality between the tables. Right. And the AI may not know that. So it's really down to, you know, back to the, you know, your inputs to all of this to data ops is, you know, good data management practices. If you've got that, then we can go so much farther. So much farther. [00:31:46] Speaker B: Yeah. And today, by the way, what the way we are operating is we've got traditional software and we're putting AI on top and we are basically, it's an interface that we are changing in the future, actually, I think AI would be able to understand the context and may even be able to find primary, key, foreign key relationships. [00:32:07] Speaker A: That'd be great. [00:32:09] Speaker B: So we haven't reached that level yet where fundamentally we are changing the way software is written with AI first or AI native principles. Right. Now we're putting AI on top of it tradition software. [00:32:22] Speaker A: Yeah. So what's, what's next for you? You got any other webinars? I mean, obviously you have your podcast that people can, can hear you on a regular basis, right? [00:32:34] Speaker B: Yep. Yeah. So I'm continuing to dive deeper into the intersection of AI and data. In my blog that I, I just mentioned earlier on the Gartner conference, I I mentioned there's only one mote that organizations have and that is their data. Everything else is just comes and goes. It's a model. Today it's an agent, tomorrow it's a new reinforcement, learning with human feedback variation. But none of those things matter because, I mean, they do matter, but they're fungible. Your data is not fungible. Your data is your moat. And if you don't get your data, get a good handle over your data. AI cannot do much because it'll hallucinate even more. [00:33:28] Speaker A: Exactly. All right, well, unfortunately, as we knew would happen, we have run out of time on our episode already. So thank you for joining me today and sharing your insights on the buyer's guide and the state of data ops and even and what you learned at the Gartner Summit last week. I think that was, that was great. You know, thanks to everyone else who's online who's joined us and or is watching the replay. You can join me again in two weeks. My guest is going to be Professor Barzan Mossafari, who's the co founder and CEO of Kibu. And that's going to be a very interesting conversation with him, I'm sure. And as always, please like the replays for today's show and tell your friends about the True DataOps Podcast. Don't forget to go to TrueDataOps.org and subscribe so that you don't miss the notifications for our next episodes. Until next time, this is Kent Graziano, the Data Warrior, signing off. For now.

Other Episodes