Episode 10

February 21, 2024

00:32:57

Ronny Steelman - #TrueDataOps Podcast Ep.28 (S2, Ep10)

Hosted by

Kent Graziano
Ronny Steelman -  #TrueDataOps Podcast Ep.28 (S2, Ep10)
#TrueDataOps
Ronny Steelman - #TrueDataOps Podcast Ep.28 (S2, Ep10)

Feb 21 2024 | 00:32:57

/

Show Notes

In this episode of the True DataOps podcast, host Kent Graziano sits down with DataOps expert Ronny Steelman to explore his data management journey and expertise in DataOps. Ronny shares insights from his experience, particularly with implementing the DataOps Live platform for Snowflake. Throughout the conversation, they delve into the advantages DataOps offers clients and development teams. Distinguish DataOps from DevOps. They also cover the impact of artificial intelligence and machine learning and the rise of data products.

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Foreign welcome to this episode of our show, True DataOps Podcast. I'm your host, Kent Graziano, the Data Warrior. Each episode, we're going to bring you a podcast covering all things DataOps with the people that are making DataOps what it is today. If you've not yet done so, please be sure to look up and subscribe to the DataOps Live YouTube channel, because that's where you're going to find the recordings of our past episodes. So if you've missed any of the prior episodes, you can catch up there. Better yet, if you go to truedataops.org, you can subscribe to this podcast and get proactive notices when we're going to do another session. My guest today is a DataOps implementer and architect, Ronnie Steelman. He's the CEO of the consultancy Quadrubyte. Welcome to the show, Ronnie. [00:00:51] Speaker B: Thanks for having me, Kent. Excited to be here. [00:00:53] Speaker A: Yeah. Well, so why don't you tell us a little bit about your background in data management and your experience with DataOps? [00:01:02] Speaker B: Yeah, definitely. So I've been in the world of data management, data governance for about 13 years now. I started off as a software developer, did a stint in New York City working for an organization there that did analytics and data management for marketing. So search engine optimization, back when keyword mining and all of that was really big. It was during that time that I found that I really had a passion for data. It just really clicked for me and so transitioned out of software development and more heavily into data architecture, data management, etl, elt, processes, things of that nature. I've gone through very different organizations, healthcare companies like IBM. I've done consultancy and then most recently in the past few years, decided to start my own data boutique consultancy that focuses specifically on data, data management, data governance, and then of course, analytics, AI and machine learning. And then within the past year or so we've added in blockchain as a data source. So being able to use decentralized blockchain solutions for transactional data. [00:02:14] Speaker A: Well, so recently you've been pretty hot and heavy out there implementing actually dataOps live on Snowflake for customers. Want to start off with from your experience doing that in your perspective, what is DataOps outside of the software and how does it really fit into the evolving data landscape? [00:02:37] Speaker B: Yeah, what drew me to the DataOps platform and kind of like that term of DataOps is my background in DevOps. So I've been doing DevOps for a number of years. When I worked for a consultancy about six years ago, I was instrumental in helping to develop their internal DevOps product. I then went on to work for a DevOps company that did DevOps as well. And so just kind of that entire mindset of continuous integration, continuous development, governance, security, maintainability of environments, all of that just was ingrained in me and I started thinking like, how can we do this in the broader scheme of the data world? I did some of my own things using GitHub repositories whenever I moved into SQL using or to Snowflake using the snow CLI. It was during that time that DataOps live of broke out into the market and I got very interested in what they were doing and brought them into one of the organizations that we were working with. It was very successful there. We've gone on to implement it again for other work for another organization and in both instances the customer has been extremely happy, the developers on that side have been extremely happy, and then even my team has been extremely happy working with the platform. [00:04:03] Speaker A: So what are some of the benefits that folks have? The customers and obviously and your team. So there's two perspectives there, the benefit for the customer and benefit for your team that's actually helped doing the implementation. What have you seen there? [00:04:18] Speaker B: Yeah, so from a team perspective, it is amazing that we get to drop into a development environment that we're more comfortable with. In our case, Visual Code Studio integrates nicely with Git. We have DBT at our fingertips, which is the underlying foundation of the DataOps Live platform. That's been extremely helpful. A lot of the functionality in DataOps Live, such as command line features to reverse engineer certain tables into their soul language, things like that, it just really speeds up development. And then of course, being in a GitHub repository and that type of co development environment, it allows our team members to work on things together, work on things in tandem and just manage all of our different work streams more effectively, efficiently and without kind of stepping on each other's toes. From the customer side, I would definitely say that environment management is probably the biggest thing that our customers lean into whenever they're deploying something like this platform. Being able to test things in lower environments before impacting your production users is something that I think resonates well with every customer, every company. The other thing that I think that a lot of customers get a lot of value from are the ELT ETL processes that you can build into the platform. So it goes beyond just managing your data model, it also allows you to manage your data pipelines as well. [00:05:47] Speaker A: Yeah, and you mentioned. Well, I got there's so many things there to go into. The collaboration aspect, which is one of the seven pillars of True Data Ops is the collaborative environment. You said that sounds like that was fairly beneficial for your team as they're trying to build and deploy things for the customers. [00:06:07] Speaker B: Yeah, definitely. The collaboration, I think, is probably the strongest thing that internally we see as a development team. The other thing that I would say in kind of like that same space is the ability to have feature branches within Snowflake that I think has been phenomenal in ensuring that we're not touching customers core data. We don't have to worry about modifying data and trying to figure out what its original state was. [00:06:36] Speaker A: Yeah, yeah. And I think that's the environment management is another one of the seven pillars. And you mentioned that from a customer perspective that was beneficial for them, but sounds like. Yeah, on the developer level as well. So the ability to have not only like dev test and qa, but like you just said, branching, being able to branch off of dev and, you know, that's one of the features that I guess I've. It's one of the core features that I've really liked a lot. And it was a core based on a core feature in Snowflake that I loved when I was working at Snowflake was the Zero copy clone. [00:07:12] Speaker B: Yes. [00:07:13] Speaker A: Because in the data world, you know, and I go back a long way with data warehousing and we were never able to have enough data in QA in particular to actually really test stuff before we put it into production because it was just too expensive to. To have that disk space. Literally to have the disk space to copy a 75 terabyte data warehouse and have it available for developers and testers. [00:07:46] Speaker B: Yeah, exactly. And then of course, there's all the other concerns about copying data around, data security, data privacy, all those things that just almost fall to the wayside whenever you have a solution like this. [00:07:58] Speaker A: Yeah, I guess governance is another feature in there, right? [00:08:03] Speaker B: Yeah. I would say that the current implementation we're doing today, governance and change control, which is one of the pillars of course as well, I'd say that's probably been the most beneficial to the organization that we're implementing it in. Being able to have that governance around who can see the data, who can access the data, who can modify the data. Even all of that, I think, has been extremely beneficial to the organization. And it really allows them to have a more clear and focused lens on something that is so pivotal because for this customer specifically, there are a lot of government standards that they have to meet and those standards are reported upon by their data. So if their data isn't locked in place or if there's any question about their data whatsoever, it leads to audits and possibly even government loss of funding. [00:08:57] Speaker A: Yeah, yeah, that's a pretty negative outcome that they would not want to see happen. So a little, little branch off of this one because you're getting into AI and all of that as well with this. You know, I'll say, I'll say rise in interest at least in, in AI and machine learning. Are you seeing more organizations concerned about the governance aspects of the data that's being fed into these models? [00:09:27] Speaker B: We definitely are, but it's more in terms of where does the AI live? Because they want to know where, where does my data live? Ultimately at the end of the day, who can access the data that's being fed into them. So, you know, Snowflake is coming out with more AI driven solutions. And sometimes the concern there that we've heard from customers are if this is being pushed into the Snowflake system for AI consumption, is that same data or those same trends and models that are being built off of our organization, are they being used to influence other organizations maybe across the board? Now I think that that comes in large part due to some people's misunderstanding of how Snowflake is implementing AI versus if you go out to ChatGPT and paste a bunch of code or data into the, you know, into the chat prompt, you they, they kind of see those as equal things in terms of your data just kind of disappearing into this black hole. And so I think a lot of what we are seeing today in terms of AI and its adoption is mainly just around education and a little bit of the fear factor. [00:10:35] Speaker A: Yeah, yeah. So it's like you said. Yeah, I think I saw something about that just earlier this week. That, that very concern that you know, you are feeding, if you're feeding data into a, I'll say a public AI or LLM like Chat GPT is where, where is that information going? But if you're building this out inside of Snowflake environment and you know, managing it with something like DataOps Live and you have the governance and even, you know, security masking all of that sort of stuff, then it's no longer going into a black box, I guess. Right. You know, where the data is, it's contained. [00:11:15] Speaker B: Exactly, yeah. And like I said, a lot of times people, they have this fear factor because of so much that's out there about cloud AI. Of course, Snowflake lives in the cloud. So all of that just kind of snowball effects. And, you know, I co host and I'm a creator on a podcast, the Intelligence Economy, where we talk about this quite often, where we go into this fear factor that people have about AI and control of their data and how we can help them overcome that. [00:11:46] Speaker A: Yeah. And I guess that's probably again, one of the core benefits that I've always seen with Snowflake. And there was an education for sure for people to get to understand that if your data is in Snowflake, it's secured, it's encrypted, nobody else can get to it. Even though Snowflake is in the cloud, your account is your account and it's secured and nobody else, including Snowflake employees, can see that data. Right. And so I guess that's the, the, the education. So we're getting, I guess as the audience for the consumption of the data and the use cases regarding AI and LLMs have expanded, you've got now new people becoming interested in this. Where we spent literally the six years I was at Snowflake educating people on, no, no, here's the security, here's what it looks like, here's all the tools that you have at your disposal to make sure nobody can mess with your data, see your data unless you grant them access to it through, you know, data sharing. So now we're going to. Do we have to go through that again with regard to AI is like, yes, there are environments that exist today that you don't have to go build yourself where your, your source data is going to be secured and you control what goes into the, into the model and you control where the output goes. And it's not going off into some cloud. I guess it would be a black cloud. Right. [00:13:22] Speaker B: I think what's great about that too, that Snowflake is making a big play in the AI field, is that you don't have to move your data in and out of a system. You don't have to offload it to another system and then bring the results back in. Where you've set up your governance, where you've set up your security modeling, where you've set up all of your access controls inside of Snowflake, all of that still exists. And then you can leverage AI on top of it. You don't have to worry about managing access controls and security and governance in another system, making sure that the communication between the two systems is secure. It all just happens in that contained environment or organization that, like you said, it's Truly your, your organization, it's not snowflakes, it's just. Yeah, it's on their hardware, but at the end of the day it's yours. [00:14:09] Speaker A: Right? Yeah. Now I want to take a quick dive down just a slight branch here. Since you came from a DevOps background, a lot of people ask this question. It's like, what's the difference between DataOps and DevOps? [00:14:23] Speaker B: So the overlap is in, in the DevOps world and then Data ops is in your continuous integration and continuous development, your code management. So both of them lean heavily into code sharing and to co working alongside other developers. Proper branching, proper protocol inside of Git and then that automation of. As I promote code up into higher branches, they actually roll into higher environments. There's automated testing that you can do so that you can ensure that your code hasn't broke. Other things within your system, I would say that those are the overlaps. What I see as edging out or being different from traditional DevOps are the security, governance, modeling inside of your data. So being able to keep track of your grants, keep track of your roles and permissioning, do data masking. Everything that you can do from a security and governance standpoint, you can put into your CICD, I.e. data ops, and it manages that for you. And so you can always kind of have that change control in place and you can always see when a change was made, who made it. You can go ask the question, why was it made? But then even going a level higher is now your data pipelines, your ETL and ELT pipelines can also be managed in the same way so that you know that when data is moving from system to system, it's always moving in the same way, no matter what. [00:15:53] Speaker A: Okay, good. Yeah, It's a little bit more than DevOps is also DevOps concepts, but the application of them to the data itself and the data pipelines, like you said, it's like data pipelines. Yeah. You can think of those as code and you got to manage the code, but you also have to make sure that they're running and it's orchestrated properly, whether it's supposed to be running in parallel or in serial. Excuse me. All of that has to be kept track of as well. And that then laps into this new area that we, that people talk about called observability. Right. Being able to see all of that stuff. [00:16:32] Speaker B: Yes. Yeah. And I think that's probably the next evolution of Data ops is that observability factor, being able to have those insights into how are my pipelines doing, what's failing, what's succeeding? How long is it taking to run? What are my success markers as a, as a Data Ops customer? Like what makes me a successful customer versus an unsuccessful customer? And I think that's so interesting because, you know, like I said, in my past life, I worked for a company that developed CICD software specifically for Salesforce. And those were some of the questions that we answered for the core platform that we developed for our customers, which were Salesforce customers. And that was what makes a successful customer versus an unsuccessful customer? And how do you mark that? How do you measure that and then how do you communicate that back to the end user of the platform? [00:17:28] Speaker A: Yeah, yeah, okay. And I know that, yeah, they've definitely got some unified observability concepts built into the DataOps Live platform now. So get more and more so that you as an organization have access to all of that information. Not just, it's not just orchestrating the data, but being able to see the orchestration of the data. [00:17:51] Speaker B: Right, right, exactly. [00:17:53] Speaker A: So do you think it's the way things are going with this growth in AI and more and more data? Everybody wants even more data than we wanted before in data warehousing. Are we going to be able to manage this sort of stuff at scale if we're not doing something like Data Ops and having that kind of mindset and approach? [00:18:13] Speaker B: I think the broader question of can we do it at scale? I think that comes down to yes. I mean, anything is possible. We can definitely do it at scale. I think the real question there is can we easily do it at scale? Can we do it without a lot of overhead, whether that's technology overhead or workforce overhead. And to that question, I'd say no. We've seen time and time again, just within our industry that our customers really struggle to work at scale on large, large scale data models, large scale AI initiatives, things of that nature. And in the areas that we've implemented a tool like the DataOps Live platform, it is just simplified life across the board from development all the way up to executive leadership having trust in their data. It's just been something that when we talk about at scale, it gets you there faster, it gets you there easier, and it gets you there in a more cost effective manner. [00:19:12] Speaker A: Right. So really managing the total cost of ownership of doing it. We can muscle our way through it if we have to. Right, exactly. But at what expense? [00:19:23] Speaker B: Yeah, and I would say that your expense is going to be cost both times, time and money and trust or confidence if you're muscling your way through it. I think that your cost and your, your time go up and your trust and confidence go down and really those two should be reversed. And I think that's what DataOps Live does, is it reverses those two. [00:19:46] Speaker A: Well, that gets you into the buy versus build conversation, right? [00:19:49] Speaker B: Yes. [00:19:50] Speaker A: Like, yeah, you might have a lot of smart engineers and yeah, you understand DevOps, you understand the Data Ops concepts and you could build, build all of this software yourself, but then you have to manage it and maintain it and it's going to take time. Right, to get to that end state that you could implement fairly quickly if you started with a platform that actually already had all those features. [00:20:12] Speaker B: Right, exactly. I mean, I remember whenever we first wanted to do CI CD and Snowflake, they had a. I can't even think of the name now. They had a little command line package that was out there that was put out by Snow. Snowflake Organization. [00:20:24] Speaker A: Yeah, it was one of the solution architects at Snowflake, I think, built that and put it out and Git. And it was kind of open sourced. [00:20:33] Speaker B: Right. [00:20:33] Speaker A: It wasn't officially supported by Snowflake, but it had been developed by somebody who worked at Snowflake. [00:20:38] Speaker B: Yeah. And so we had tried to build a CI CD solution using that. And this was before, you know, we discovered DataOps live as a platform. It may have even been before DataOps Live came out onto the market. And it was so time consuming, it was so expensive to implement because we had to do everything from scratch. When in our most recent implementation of DataOps Live, they had already built out a fairly substantial data model on the Data Vault methodology, which can be highly complex and we had to convert that over so that it could be managed by DataOps Live. And from the point where we turn the system on to actually having full management by the dataouts platform was two, maybe three months. I think what really took us out that long though was some of the concepts of using DBT Vault and making sure that we were setting up our pipelines correctly to support that Data Vault methodology. I think that's probably what really drug out the implementation process more so than just spinning up the tool and reverse engineering the model into the tool. [00:21:48] Speaker A: Right. We have a question from JT here that I think you could probably help him with saying, wondering as a Snowflake partner, how can we certify and partner with DataOps Live to be linked with customers needing consultation and support using both platforms? [00:22:09] Speaker B: Great question. So just from our experience, the I think the best way to certify as a Partner is to just get out there and use the tool. I think that there's no better way to learn this tool than to use it. You can use it with a real world example, you can use it with test data, dummy data, dummy data models. But using the tool, I think has really been the key part of getting to that certified level in terms of partnering. Again, it's showing that you know the platform and showing that you're, you're hungry for what the platform can offer to a customer or to an organization. One thing I've really liked about working with DataOps is that they're, they're not focused on signing that next contract for that next deposit. They're focused on solving a real need for their customers and they're very passionate about solving those needs. That they take this extreme hands on approach to every customer that they onboard, at least in my experience. And that's what I've seen. And so when you show that same initiative, you show that same hunger to do what's right by the customer. That's where I see DataOps really stepping up and partnering with an organization and saying, yes, we want you as a partner because at the end of the day you care about the same things that we care about, which is the customer and doing what's right by them. [00:23:32] Speaker A: Yeah. So if I remember correctly, you can go. If you're in Snowflake, it used to be called Partner Connect. Right. There was a button there that you could actually push a button and deploy a free, basically connect a free version of a trial version of Data Ops Live to your Snowflake environment and start playing around with it. There's more, there's, there's some online training, there's a bunch of webinars, things like that, right? [00:24:02] Speaker B: Yeah, there are. [00:24:03] Speaker A: Plus of course the, the user guide and all of that. [00:24:06] Speaker B: And I know that they are working to roll out like an internal academy. Of course their community is very active. [00:24:13] Speaker A: That's right, our community. [00:24:14] Speaker B: Yeah, the community is great to learn from. And then for those that are hungry to partner and work to bring in new customers, I found that DataOps is great at working with those people that are hungry for it and giving them the tools that they need, whether it's like a little internal trial environment or something like that. Yeah, I see someone posted the free trial up on the chat. There's a lot of things out there that DataOps tries to make available to the partners. And at the end of the day I would say take the initiative, go out there, jump into it and let them Just let them know who you are. They're great about responding to emails. It's not like your emails go into a black hole. [00:24:54] Speaker A: Yeah. And I know we have. We have our online community as well, so maybe we can get the URL for the community up here too, during the rest of our conversation so people can take a look at that. Okay, so as we getting kind of near the end, but I did want to ask you about data products, since that's become one of the hot industry buzzwords. Are you running across that much with your customers? Are they starting to talk about produc data products, you know, on Snowflake and using Data Ops Live and things like that? [00:25:26] Speaker B: Some, definitely not a lot in terms of the customers that we deal with, I'd say that in large part, a lot of the customers that we deal with, they're looking to make a transition into Snowflake and then to start doing things like getting off on the right foot. And I think a lot of times too, the customers that we're working with, because if you think like healthcare, you think finance, you think some of those heavily regulated industries, they tend to shy away a lot from what might be considered the buzzword or the trend of the day. I think we've seen data products for a long time. I think it's just that now we've decided to put a label on it and say that this is what falls under the umbrella. If you think back to like web 3.0 and things like that, it's like the Internet didn't evolve. It's just our idea of what makes a good website. Evolved, right? [00:26:20] Speaker A: Yeah, yeah. And I think, you know, the publication of the papers around Data mesh back in 2019, I think it was kind of popularized the concept of data products. Right, right. And I think, you know, the Snowflake platform is kind of an ideal one for doing it because, you know, they evolved the whole Snowflake data marketplace, and now it's just the Snowflake marketplace. And so that concept of creating shareable resources, for lack of a better term. Right. You know, that that's kind of the beginning of a, of a data product and, and some people trying to monetize their data and, you know, it gives you kind of a interesting perspective on how to, how to monetize that data and how to package it. And if you think about packaging it, I guess, okay, it's a product now. Right, Right. We're going to give it a name. We're going to put it out there and, and see, see if anybody wants to use it. [00:27:16] Speaker B: Yeah. And I think that's, you know, that's something that's going to continue to grow as people start having trust in these systems and in the organizations that are backing them. You think about all of the data that's in healthcare today that if they could, you know, scrub that data from, you know, PII perspective and HIPAA perspective and publish out certain metrics of data, what could these little startups that are starting to leverage AI in the medical space as an example, what could they start to do with that data? And so when we talk about the data marketplace that Snowflake has, I see a lot of growth potential there, and I'm really excited to see some of these bigger organizations starting to jump to the front of the line and start pushing some of their data out there so that the smaller organizations can consume it and build on it. I mean, at the end of the day, we have to remember that, you know, we, we build on each other, and if there's something out there that another company has that they can provide to help kind of grow the ecosystem and grow technology or grow advancements, then I think that's great. [00:28:21] Speaker A: Yeah, absolutely. Yeah. During, during COVID there was an organization called Star Schema that actually published a lot of the COVID metrics and, you know, cleaned it up, went and got it from John Hop, Johns Hopkins and various other health organizations and put it all together. So. And it was probably one of the most highly consumed data sets in the Snowflake data marketplace. And, you know, they made that available for free, so that was kind of the beginning of the whole thing. [00:28:47] Speaker B: Yeah. And then I think, I mean, I think data products are also going to change how we do things as consumers as well. If you think about coupling stock market data on, on a. As a data product with AI, at that point, does that change how we as consumers spend our money? Does it change how we invest? Does it change our investment strategy? Do we invest ourselves? Do we invest using an investor or a broker? I think that even at the consumer level, data products are going to result in a lot of changes in how we're consuming content and how we're reacting to that content. [00:29:23] Speaker A: Yeah, yeah. So JT's got another question about PH data, and if there's any connection there with Data Ops and Snowflake, and I'm trying to remember. I recognize the name. Sorry. [00:29:36] Speaker B: Yeah, I recognize the name, but I don't remember. [00:29:38] Speaker A: I don't. I know they're. I vaguely remember. I think they're a partner with Snowflake. For sure. And I'm not sure if they're a partner with Data Ops. They might be have to go look on the. I'll have to go look on the Data Ops Live site myself to see and look at the partner list. Sorry about that, JT. [00:29:55] Speaker B: I will say, JT, that even, even if not DataOps Live has their data products kind of functionality that you can even build kind of some of your own integrations into the tool. So they have a Data Ops Live API. You can deploy out the necessary Docker image after you've kind of like coded what you need. So you can do some of your own custom integrations as well. So if there's not one out there for ph data, maybe in the interim it's something that, you know, good business idea that, you know, Data Ops Live might want to take off your hand in the future. [00:30:32] Speaker A: Yeah, well, unfortunately we're kind of running run out of time here. How can people connect with you, Ronnie, to continue the conversation here? [00:30:43] Speaker B: Yeah, so they can connect with me on LinkedIn. You can search under, under Ronnie Stillman. It's LinkedIn.com RL Stillman. So if you want to connect with me there, that's a great way. You can follow me on Twitter R.L. stillman as well. You can go to our website quadrubyte.net, all of our social media and ways to contact us are there. And then of course we encourage you to follow our podcast the Intelligence Economy where we're always talking about data data initiatives, products like DataOps Live. Our next podcast is going to be talking about my book that's coming out around leveraging the, we're calling it Mastering Mastering the Snowflake SQL API Using back end Programming. How can you go outside the box and use what Snowflake is providing to you at the API level to really enhance your Snowflake experience? [00:31:37] Speaker A: Cool. Good. Well, congratulations on the book. Glad to hear there's more stuff coming out. Using the Snowflake API is definitely, I'll say, one of the more mysterious aspects of Snowflake. So I'm sure that will help a lot of people. [00:31:52] Speaker B: Yeah, definitely. [00:31:53] Speaker A: Yeah. Well, thanks for being my guest today, Ronnie. It's been a great conversation. Enjoyed having you on. Thanks everyone for joining in and jt, thank you for all the questions and the other folks who posted the links. Thank you for participating today. Be sure to join me again in two weeks. My guest is going to be a buddy of mine from Snowflake, Vernon Tan, who's just an exceptional sales engineer and he's now the leader of the Frostbite team at Snowflake, and we're going to talk about some really exciting things that Snowflake is doing internally with DataOps Live, so you don't want to miss that show. And as always, of course, be sure to like the replays from today's show and tell your friends about the True DataOps podcast. Don't forget to go to truedataops. Org and subscribe to the podcast so you don't miss, especially this next coming episode. So until the next time, this is Kent Graziano, the Data Warrior. See you later.

Other Episodes