Episode 15

May 15, 2024

00:27:39

Armon Petrossian & Satish Jayanthi - #TrueDataOps Podcast Ep.35

Hosted by

Kent Graziano
Armon Petrossian & Satish Jayanthi - #TrueDataOps Podcast Ep.35
#TrueDataOps
Armon Petrossian & Satish Jayanthi - #TrueDataOps Podcast Ep.35

May 15 2024 | 00:27:39

/

Show Notes

In this episode of the #TrueDataOps Podcast, host Kent Graziano, the Data Warrior, interviews Armon Petrossian and Satish Jayanthi, co-founders of Coalesce.io. They discuss their backgrounds in data management and the creation of Coalesce.io, a platform focused on resolving issues in the data transformation layer with metadata-driven automation.

Armon and Satish emphasize the importance of dataops, describing it as a set of best practices from development to deployment, with a strong focus on automation to reduce errors and improve data quality. They discuss their partnership with dataops.Live and how both organizations aim to promote dataops best practices within the Snowflake ecosystem.

The conversation also covers the concept of data products, advocating for high-quality, self-service data outputs that are well-documented and reliable. They highlight the need for a unified vision and cultural alignment within organizations to implement dataops and build effective data products successfully.

 

View Full Transcript

Episode Transcript

[00:00:04] Speaker A: Alright, welcome to this episode of our show, true data ops. I'm your host, Kent Graziano, the Data Warrior. In each episode, we're going to bring you a podcast covering all things related to dataops with the people that are making dataops what it is today. If you've not yet done so be sure to look up and subscribe to the Dataops live YouTube channel because that's where you're going to find all the recordings of our past episodes. So if you've missed any of our prior episodes this season or last season, you still can have an opportunity to get caught up. Now to make sure you don't miss any in the future, if you go to truedataops.org, you can just subscribe to the podcast and you'll get proactive notifications when we schedule upcoming events. So my guests today are my good friends, co founders of Coalesce, IO, Armon and Satish. Welcome to the show, guys. [00:00:54] Speaker B: Hey, Ken, thanks for having us. Looking forward to this. [00:00:58] Speaker C: Glad to be here. [00:00:59] Speaker B: Yeah, it's nice to have you interviewing us for a change. [00:01:02] Speaker A: Yeah, for a change. Yeah, yeah, yeah. Because I've been on your coffee with coalesce multiple times since you guys came out of stealth and it was the first time gotten you you guys on the other side of the questions. So thanks for being here. You know, congratulations on your success. Series B, that was really exciting news here. Glad to see that moving along. So, guess there's a, you know, this, this coal cola thing's got something to it. Apparently, at least a few people think so. [00:01:32] Speaker B: Yeah. Yeah, we like to think so too. [00:01:35] Speaker A: Yeah. And as this teacher says, you're metadata driven. So that's a becoming a great theme. So for the folks who don't know you two, give us a little bit about your background in data management and why you founded coalesce. [00:01:50] Speaker B: Yeah, I'm happy to answer on the founding piece. I mean, both Satish and I have worked together for nearly a decade, and basically we interacted with dozens of different customers at the largest scale when it comes to data warehousing and data engineering projects that this world has ever seen. And while interacting with these implementations, saw these very specific breaking points within implementations. And what's held back organizations from being successful with analytics historically has largely been around taking raw data once it's landed in the database and getting it to the point that it's consumable with proper documentation, lineage governance, and just an overall efficient data engineering operational practice. When we saw that enough times, the thought of starting a next generation product that solves that major bottleneck in the analytics workflow was something that we had always fantasized about. And finally, enough things tipped in our favor where we decided to start a company to build a software product that was tailored towards resolving what is historically and still is the biggest bottleneck today in the analytics workflow. And that is what we call the data transformation layer. So that was the genesis. And Satish and I worked together in various different roles, but this one is the most fun by far, which is co founders. [00:03:20] Speaker C: Yeah, absolutely. [00:03:23] Speaker A: So, satish, tell me a bit about your background. [00:03:26] Speaker C: Yeah, so Armon covered the last decade or so, how we work together. Obviously, prior to that, I was mostly on the other side. They're implementing real world data warehouses, enterprise versions, managing data teams. I had been in environments where I had pretty much every ETL tool that I can think of, and large teams, yet I had trouble delivering on time to the business. And if I, you know, if you think about it, it's the same problem, right? The data transformation, that was always the problem. Business would ask simple questions that would take forever for us to answer because, you know, we have bought 510 companies in a matter of one year. Each one have their data warehouse, and the CEO is asking a question that would require us to integrate all of the data, and that takes a long time. So all the pain points that I have experienced in the past, you know, we are able to put that into this product. [00:04:45] Speaker A: Yeah, I think that's kind of the way I felt when I ran across Snowflake back in 2015. It's like all the problems I'd seen in trying to build scalable data warehouses over a couple of decades, it's like, okay, Snowflake was really solving it. And like dataops Live, which is built exclusively for Snowflake, you guys chose to just build a product for Snowflake as well. What was your reasoning behind that? [00:05:13] Speaker B: Yeah, that's an easy one. So when you think of what companies have been the most successful in the analytics industry, there is typically a unifying theme around the product philosophy. And when I say that, I mean there is levels of ease of use, but extensibility. There is levels of automation, and then typically speaking, there's some architectural advantage. But those core themes, those core pillars around automation, ease of use and making what was complex a lot more simple, but still extensible, are what has been the most successful. So think like tableau on one end, think Snowflake on the database side. Fivetrans certainly has these same characteristics on the ingestion side, and so that audience of users typically wants a best of breed solution. And we wanted to build a product that embodied all those same characteristics for an area of the workflow, the area of the analytics workflow that has historically had none of that. And so we laser focused in on the snowflake ecosystem, knowing that the users within that ecosystem appreciate something that's simple but extensible, something that's automated to help accelerate efficiencies, and really just focused on how do we make people as productive as possible when working with our platform. The same way that you see with the Snowflake platform. [00:06:44] Speaker A: Great. Yeah. So you guys have partnered with my buddies here at Dataops Live. Tell me a little bit about how that partnership came about and what you think the potential value is to your, to your customers. [00:06:58] Speaker B: I think it came about through you, Ken. You're the one who introduced us, probably so. Yeah, yeah. You know, I think the core, the core focus there with the relationship with dataops Live, it really is under the premise of data ops in general. Us having a focus on improving data ops for customers within the snowflake ecosystem and any way we can encourage that type of behavior, encourage that type of usage when it comes to our platform is something that we're interested in. And I remember clearly when we were at the first Snowflake summit, when we came out of stealth, that was like one of the biggest takeaways from you was, hey, you gotta go talk to these guys and figure out how you can work together, basically. So we've been close with the team there for quite some time, and I think it's all focused around how do we, how do we encourage the adoption of dataops best practices and just in general, and that is partly embedded in our solution, I would say it's like a focus of our product, certainly a focus of dataops lives product, and I think something that's important for the industry to acknowledge and move towards satish. [00:08:13] Speaker A: From your perspective, how do you define data ops? What is it and how important do you think it is in today's landscape? [00:08:23] Speaker C: Yeah, absolutely. So data Ops is a set of best practices from end to end. You know, the time when you start building your pipelines to all the way scale them and test them and deploy them efficiently. So all the set of best practices, technologies, tools all combined together to be able to make that happen in the best possible way. In the past, some of these practices were mostly around software application development. That's where it started. It's called DevOps. When Agile came in, agile addressed the development aspect of, of things like, you know, you break your work into sprints, you know, you fail fast. And all of those practices were implemented, but primarily around development. But then what they found out was this last leg of the problem where, how do I deploy this? You know, then the Ci CD started, okay, so there got to be a way to incrementally release these things, you know, origin things, automated deployments and all of that. Automated testing, integration, testing. So those practices have been, you know, inherited into or imported into data environments. And that's what is defined as data ops. Essentially. The idea is the same, although data projects are a little bit different, quite different from applications because there is this big, huge elephant in the room, which is your data set that applications don't usually have that. So yeah, again, that's kind of overall what data ops is. [00:10:18] Speaker A: Do you guys think that it's really possible for organizations to deliver at scale with all this data if they're not adopting some sort of agile data ops approach or mindset? They kind of have to have that, don't they? [00:10:32] Speaker B: I mean, unless they've gotten limited resources and time, which most companies don't. I guess some companies do, but that's just the mess that I don't think anybody should imitate. And so realistically, I think from teaching ads experience, even before the major cloud migration and movement, there were characteristics of data ops that we were maybe subconsciously implementing through products that we've worked with, but it's now becoming something that is possible to take that fully holistic approach and encourages success. So could you be successful? Maybe, but don't try. It would be what I would say. [00:11:15] Speaker A: Yeah, I guess I need to change the question from is it possible to deliver value at scale to, is it possible to successfully deliver value at scale without doing something like this? And like you said, yeah, you could. Again, how do you define business value and what's your success criteria there? But yeah, like I said, yeah, what. [00:11:37] Speaker B: Are you spending for the game basically, right. [00:11:40] Speaker A: Total cost ownership along with the ROI. So obviously you guys are like super into automation. I mean that's kind of what your platform does. So what do you see as the role of automation in the data ops type processes? Maybe even if you want to go off into it AI driven type. [00:12:02] Speaker C: Yeah, there is a lot of opportunity in the data ops as well. I've been looking up some products that are, for example, when somebody merges their code into the origin control system, an AI system can kind of summarize what the changes are based on the code that they're merging, that is in a way it's automation to write comments because when you're merging code and you don't write proper comments or you forgot, then you might have some problem down the road with that. But AI automation, or in general automation definitely helps every part of this stream and you know, as there's automation in every aspect of it. For example, the way that we do, we automate transformation piece. So the development side of things, you know, but there is automation around deployment. For example, as soon as you merge some new code into a test environment or some environment, can that kick off an integration test, a suite of integration tests, and spin up a new environment and do it really fast without anybody intervening manually to do all of that. So that's definitely like a very valuable step to be automated. So there's a number of opportunities and I think it's very, very important to automate, especially. Again, we're talking about the scale at which we want to build these, reduce the number of errors, reduce manual intervention, and improve data quality. All of these things happen only I think there's no other way other than incorporating automation every step of the way, wherever it's possible. [00:14:03] Speaker A: So it's really automating the entire data ops workflow from one end to the other. Yeah. One of those seven pillars of true data ops is automated testing and monitoring. And you kind of just touched on that is we don't want to have we eliminate those manual steps of saying, okay, we've checked the code in, did somebody run the test? And then what were the results? And having to manually go look at all that. If we can automate that, that part of the workflow for sure, to make sure that we, we have automated regression testing, that if somebody makes a change and they check it in, it's like there's no question that it's going to get tested. It just like it just happens, right? [00:14:44] Speaker C: Yep. [00:14:45] Speaker A: If we're going to have a move forward in these things, in, I'll say a rapid, agile manner that we're getting down to the classic people, how many changes can we deploy in a day? No longer talking about it's a three month project cycle or even a two week iteration, now a two week sprint. Can we do better than that? I know that that's not possible today when you throw in tools like data ops Live and coalesce. I know seen some great demonstrations from Dougie on how fast he can make changes to things in the data pipelines just using coalesce because it's so much of, it's automated it's a couple of changes. Push button and regenerate the code, and you're off to the races. So data products, another hot industry buzzword that's really kind of taken on a life of its own, started off kind of the idea of data products, and data products kind of been there for a while. But then when Zhamak Degani wrote her papers in 2019 about data mesh, it kind of raised it up a level, but now it's kind of taken on a life of its own. It's not just a data mesh concept. Everybody's starting to talk about things like building data products. So what's your guys take on that as a concept? And do you think it's an important approach that people should be adopting in our data landscapes today? [00:16:25] Speaker C: Yeah, I mean, I can answer that. So it's true that the concept has been there, but I think the awareness in consciously building something of high quality has kind of taken importance now. And because we have dealt with data quality issues in the past, lack of documentation, whatever that is, you know, when we say data as a product, then we automatically assume certain standards that comes with the product. When you go to a store, you buy a product, you expect a manual with it. You don't expect somebody coming to your home to teach how to use the product, and you expect it to work just based on what is set in the manual. The same kind of principles apply here. A data product could be anything. It could be one table. I could just publish a table that is consumed by somebody, or it could be one API or a set of APIs. It could be anything. As long as I'm producing something of high quality that is going to be consumed by someone else in a self service manner. And, you know, something that is very highly reliable, documented in, you know, properly managed, you know, that is a data product. Now, how you get there is not as easy, you know, as it sounds, because, you know, we'll come back to all these other things that we have to do, right. In order to produce something of that high quality. Yeah. So our take on that, again, Armand and I talk about this a lot of times about data mesh and all these great paradigms that are very, very important. It takes a lot of work, and we at coalesce, we like to think we play a big role in that. [00:18:31] Speaker A: Yeah. So I was going to ask you guys, how do you see coalesce fitting into the development and delivery of, of data products? [00:18:38] Speaker B: Yeah, so, I mean, first off, I would say both Satish and I talk a lot about data mesh as a concept, as a paradigm, and very encouraging organizations to implement them. I think historically, it's been something that's plagued enterprises, that by not taking a data mesh approach, because you basically are building silos, there's siloed efforts between each department, whether that's core it teams versus the finance team, the marketing team, any other departmental citizen, data engineer that's trying to do a last mile effort. And so when we view how we approached supporting that methodology, there needed to be a balance between giving the most sophisticated technical users all the extensibility that they could possibly want when interacting with Snowflake, but doing so in a way that allowed them to be exponentially more productive, yet at the same time, making the product easy enough to use at coalesce and the platform easy enough to use, that your less technical resource within a department can also be just as productive as one of those highly sophisticated, principled data engineers or data architects and everybody in between those varying skill sets. And so, as the organization goes about building data products, being able to have metadata that bridges the gap between the projects across the organization, from central it to finance marketing, whatever other component of the business is building out their own data products for whoever the consumers are, is really the grand vision here. And so I think the biggest item holding organizations back from being able to be successful with data mesh or building out data products has largely been around how time consuming and difficult and risky getting data to the point that it's consumable for insights with lineage, with proper documentation has been. And that really is the crux of the issue. That's the foundational problem. That's, that's held people back from being able to be successful with a data mesh approach and being successful at building data products at scale, efficiently. [00:20:58] Speaker A: Yeah, yeah. So what kind of advice do you have for organizations that are really getting started on this? What kind of things do they need to take into consideration if they want to be successful? Like you were just saying, in delivering data products with a, I'll say a distributed data mesh type organization. [00:21:21] Speaker C: Yeah, it definitely starts with the culture of the organization. You know, first data, you know, understanding how important the data is by every department and understanding that it is important for them to build a high quality data product with the intention to share with others, like other parts of the business. So at the end of the day, you can have this data coming from the owners of the data, in the sense, hey, the business was responsible to originate, like, who originated this data, who have now had the tools to curate this data and then aggregate that to a point where it is used by the company. So that is the vision that everybody in the organization should understand first. That's where we need to go. Because a lot of times what happens is we throw these buzzwords and the organization, everybody has a different idea on what it is, and it never gets accomplished. So establish that vision and the benefits. That's the number one. And then everything kind of falls in place. I would say you got to find the right people, the technology and all that. We can talk about that all day long, but I would say if you are able to accomplish that number one thing, which is the vision, and make sure that everybody understands it in line, most of the times, you'll make a lot of progress towards that goal. [00:22:58] Speaker A: Yeah. Because otherwise, you can take a tool like Snowflake and coalesce and data ops and throw it at an organization. But if they don't have that agreed upon vision and approach, you can just use those tools to build legacy data silos faster. [00:23:16] Speaker C: Exactly. Right? [00:23:17] Speaker B: Yeah. I can do this in any department. Right. Like in any department of a business, you could have the best technology. But if you're not, if your vision isn't there, and if you're not culturally aligned on what the expectations and outcomes are, then it doesn't matter how good of the tech you have. It's from the top. So it's failing from the top. [00:23:39] Speaker A: Failing from the top. So what's coming up for you guys next? Over the last couple of months, you were co sponsored a whole lot of snowflakes, data for breakfast events. You guys were literally all over the planet. And then we just had the data vault worldwide data vault consortium in Stowe, Vermont, a couple weeks ago. So what's next? What else you guys got coming in the pipeline? [00:24:08] Speaker B: We've got one really big event, the one that I'm always most excited to. It's typically the best week of the year, which is Snowflake summit coming up here in a few weeks. So we've got a big booth. It looks like there's a QR code down the bottom center here if you want to check out any information about summit, or register if you're not already registered, is always the best event of the year. And it's finally in San Francisco instead of Vegas. So for me, I get home court advantage. I don't have to get a hotel or get on a plane. This is great news. So we'll be there. [00:24:45] Speaker A: Lucky you. [00:24:46] Speaker B: Yeah. Yeah, lucky me. Not lucky for everybody else. I know it's hard to fly in SF than it is to fly into Vegas. But it will be an awesome experience and we'll be there in full effect. So we'll have quite a few people. There's a big booth on the exhibit floor. There's numerous sessions that we're promoting with some of our customers, like Justin from the Shane company, Susan from Patronix, and then a couple demos, live demos that we're doing on the snowflake platform with coalesce. So that's the biggest thing that's on my radar as far as work goes. [00:25:23] Speaker A: That's pretty big. That's pretty big. Yeah. Overall and. Yes. [00:25:26] Speaker B: Yeah. [00:25:26] Speaker A: So it's back to San Francisco. Is the first one in 2019 was in San Francisco. [00:25:31] Speaker B: That's right. It was. We were there. Yeah, we were there. I remember being there. That was. I mean, I'm sure it's going to be a lot bigger. Yeah. [00:25:39] Speaker A: A lot bigger now than that one. [00:25:40] Speaker B: Contrast compared to that one. That one was at the Hilton, I think. [00:25:44] Speaker A: Yeah. Yeah. It was the Hilton Union Square. [00:25:46] Speaker B: That's like, not even enough to take probably a 10th of the guests. So, yeah, it'll be a lot of fun. [00:25:54] Speaker A: Yeah. Great. And so what's the best way for folks to connect with you guys? [00:25:58] Speaker B: Oh, I'm easy. You can find me on LinkedIn. Shoot me a connection request or a follow. There you go. There's another QR code here at the bottom. I don't think Satish, you're that much more difficult, to be honest with you. [00:26:11] Speaker C: Yeah, same on LinkedIn. [00:26:12] Speaker B: Yeah. [00:26:13] Speaker A: Yeah. There he is. All right, well, great. That's a, you know, I appreciate you guys being on. I know it's been busy with your fundraising, and I'm sure you guys are in the throes of really getting ready for Snowflake summit. So thanks for taking the time and being my guest today. It's awesome having you on here and having you on the other side of the, of the questions this time. So, you know, thanks for joining. I want to thank everybody else for who is joining the live stream or is watching this on replay. Thanks for your participation. Be sure to join me again. In two weeks, I'm going to have the last episode of season two. My guests are going to be my good buddies Frank Bell and Stuart Bryson. So that should be a fun chat. And it's also going to be right in literally less than a week before Snowflake summit. So I'm sure that'll come up. So, as always, be sure to like the replays from today's show and tell all your friends about the True Data Ops podcast. Don't forget to go to truedataops.org and subscribe to the podcast so you don't miss the future episodes, especially later this summer, we'll start announcing what we're going to be doing for season three, and I'm sure you're not going to want to miss any of that. So until next time, this is Kent Graziano, the data warrior, signing off for now. See ya.

Other Episodes