Episode 57

January 06, 2026

00:23:05

#TrueDataOps Podcast – Angela Harney - Ep.57 - S4.EP4

Hosted by

Kent Graziano
#TrueDataOps Podcast – Angela Harney - Ep.57 - S4.EP4
#TrueDataOps
#TrueDataOps Podcast – Angela Harney - Ep.57 - S4.EP4

Jan 06 2026 | 00:23:05

/

Show Notes

Keith Belanger's guest is Angela Harney, a fellow Snowflake Data Superhero to talk about the topic of the year, 'AI Ready Data'.

Angela Harney has 30+ years of IT experience spanning data engineering, architecture, and teaching. A four-time Snowflake certified expert, she’s worked with Snowflake on AWS and Azure while focusing on Cortex AI, AISQL, Cortex Agents, and Snowflake Intelligence solutions.

She's passionate about SQL-driven design, Python-based ETL, and using code generation to build scalable, high-performance data solutions.

Angela is author of Snowflake How-To, with articles on Medium.com as well as the chapter leader of the Snowflake Seattle User Group, and creating Snowflake training videos on her YouTube channel.

You can also find Angela on LinkedIn & YouTube. Click the links for access.

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Foreign hey everyone. Welcome to another episode of the True Data Ops. I'm your host Keith Belanger, field CTO at DataOps Live and Snowflake Data Superhero. Each episode this season we are exploring how DataOps is transforming the way organizations deliver trusted govern and AI ready data. If you've missed any of our previous episodes, you can catch up on the DataOps Live YouTube channel to describe, to subscribe and get notified about upcoming episodes. My guest today is Angela Harney, senior Snowflake Solutions Consultant at cloudhive and fellow Snowflake data superhero. Angela has been digging deep into AI ready data, the emerging intelligence layer, and what it means to design data foundations for AI. Angela, welcome to the show. [00:00:53] Speaker B: It's always good to talk to you. Keith. [00:00:55] Speaker A: Yes, it's been a pleasure. And I know over the last few weeks you and I have had various conversations, just not for public. So hopefully we can bring a lot of those conversations we've had out to everybody else. Now you have spent a lot of time over the last few weeks. I mean some of you, some people out there may have seen it or not, but we'll get into that today researching and writing about AI ready data. Can you get talk a little bit about what you've been doing? [00:01:23] Speaker B: Yeah, it's data that's been curated and enriched specifically for AI. Everybody knows what curated means. It means, you know, null nulls in your input parameters. You don't want to wait till that input parameter reaches your LLM and errors before you handle things like that. You want to do that early on. You want to make sure that you've got validates good default values that support your AI goals in those input parameters and it's enriched. So if you've got a column of data that's got some pretty static definitions to it, go ahead and use that AIClassify AI SQL function in Snowflake to pre land a column of data that stores that classification. Or if you have a column that has some pretty static sediment, it's not going to be impacted by conditions on the fly about anything. Go ahead and use that aisetimentai SQL snowflake function to land a column that stores that sediment score. It's like this basic pre reasoned data that you can then feed into that next layer of evaluation. It's also data that's been combined and flattened to store the relationships between the data. So if you've got sediment data, it can be on row with that purchase data and any projections you might have that might apply and what AI can do with that is Much richer because it has those relationships already put together. [00:02:55] Speaker A: Right. So, you know, it's interesting that you were talking a lot about which we are today. Right. Really, the data. But sometimes when talking with people and you start talking about AI ready data, they really want to just jump into the models and the LLMs. Right. And what's your kind of take on that? [00:03:12] Speaker B: Well, a number of People think that AI ready data is data that's been through their LLMs once, but that might not be reliable because the first thing that breaks down with that is that when you query relational data, the joins are inconsistent and unstable and you get results that you aren't sure of. Also, pieces of the solution are disparate tools and difficult to combine for one cohesive picture of what information is available. [00:03:40] Speaker A: Yeah, I'm gonna, kind of, based on what you were saying, do you find that you should work with maybe a smaller data set to, to get started or, or, you know, find too many people working with very, very large data sets and, and try to do, you know, boil the ocean, I guess you could say. When, when they start getting into those AI initiatives? [00:04:06] Speaker B: Yeah, when? Go ahead. [00:04:08] Speaker A: No, go ahead. I didn't mean to cut you off there. [00:04:12] Speaker B: Well, you know, those data sets, I think that they should be put together so that you have input, your input data, but then you've also got those smaller training data sets. But in addition, there should be a smaller data set even for the developer that's just got a few rows to it. So while that developer's working on this syntax, they don't have to kind of wait and have it go through a full blown LLM set. So those training data sets, all of that is very important, I think. [00:04:45] Speaker A: So you've been spending a lot of time writing a lot of stuff and I've been kind of like even myself trying to catch up. And you write about this idea that AI ready data needs a home. Where do you believe AI ready data should reside in a modern architecture? [00:05:05] Speaker B: I see that there's an intelligence layer that resides over that semantic layer where you store those smaller data sets and that pre reasoned data. So that input data set has been combined and flattened and resides in an extra layer. Because what you want to do is leverage what was kind of like wrong and right about even your output data sets, store those and combine that back together with your input data sets so that you can have improved reasoning. That's that next iteration of value you can get out of your data. [00:05:42] Speaker A: Great. You've also described it. You know, you kind of just brought it up there a little while ago, you know, this intelligence layer as kind of this like global dictionary, you know, and hub for AI. Can you explain what that means and why it's important? [00:05:59] Speaker B: Yeah. The intelligence layer is like a central nervous system. It operates in response to what it learns, like a brain. And it can serve as a consistent base for extended systems. It consists of a metadata engine that stores that relationships between those output data sets. The input sets the input parameters to enhance what can be evaluated. Combining joins from relational data. All of that information can happen to be landed in that intelligence layer. [00:06:32] Speaker A: Nice, Nice. Now, if this intelligence layer, as you're saying, is the brain of things, we're data ops organization. Where does DataOps fit in making that layer trustworthy and usable at scale? [00:06:50] Speaker B: Well, when you combine those that joins in the relationship, you want to pipeline that into those flattened data sets and land that pre curated data that's specific for parameters, those training data sets. So there's that layer of data ops that needs to happen there. The great things about tools these days is that they're all available in one place. Like the dynamic delivery tool that your team has available at DataOps Live that operates in Snowflake to automate the CDC CICD so that it can take advantage of Snowflake's Cortex built in LLMs leveraged all in one tool. It just makes it all so easy. [00:07:35] Speaker A: Yeah, you know, Snowflake, over the last year, you know, and I know you have really dove in, I've been watching your, you know, journey, your, your journey and all of these capabilities at Snowflake. And it's amazing how many capabilities Snowflake has brought to the table. And you've done a great job at trying to keep up. I guess that's what I call it, keeping up with all these capabilities. You know, how has that journey been like in you kind of, you know, learning everything inside Snowflake? [00:08:09] Speaker B: That's one of the things that I've challenged myself to do, is to put myself on that journey and take steps every, you know, few weeks to go out and learn the different components to put together for folks to be educated about AI. You can't kind of go learn everything in a school these days. You got to go out there online, go be looking for, you know, Snowflake's got a bunch of great getting started tutorials. I know there's a lot of videos that Data Ops Live has available on their site for learning about CICD for these tools. There's a lot of information available. There's books to read all kinds of good stuff out there. So I encourage everybody to take steps and get themselves up to speed. [00:08:58] Speaker A: I can remember a day when it was just like, oh, I just need to know how to create a table and make a relationship to a table and write SQL. Right now it's like every time I turn around is some new capability, some new solution. It's, it's crazy. You know, in our conversations over the last few weeks, you know, you had mentioned that you're kind of creating a, an AI ready data certification framework. What problem are you trying to solve or why are you really approaching this certification solution or framework? [00:09:37] Speaker B: I guess you'd say, well, a lot of tools out there have an AI readiness score. It might tell you that you're at 86%. But an AI ready data certification focuses on the values inside the data. It's a lightweight data certification framework that tells you that the relationships between your data are effective and that the quality of the information you input is 100%. Because what you want to do is once you're operating on those multiple layers of reasoned information, and you know that you are, because the certification tells you that you are, you can know that the outcomes of your evaluations are closer to your targets so that when you get your results, you can trust them more because you know what your data is all about. [00:10:23] Speaker A: You know, one thing here with the data ops pipelines that we have organizations using within our solution is, is to me I see like that perfect marriage of, you know, here are those certification frameworks, here are those tests, here are those things. And then embedding them into that pipeline so that, you know, it's not a, hey, I'm just going to do this one time because I think as you've seen, all it takes is one row or also the new data source that comes in and you know, so the code might have been fine two weeks ago, but then also if you're not doing those tests in your pipelines, well then there goes your certification. Right? It just goes out the window. [00:11:00] Speaker B: Yeah, yeah. [00:11:03] Speaker A: So back to, you know, the certification framework. Why do you think that certification is important, especially for enterprises trying to scale their AI responsibility? [00:11:16] Speaker B: Well, again, it says that that information that's used for insight is known, it's effective, it's enriched with that basic AI reason, it's correctly related. And that can be a big help towards decreasing the hallucinations and that distrust. It's a real asset, a direct asset for a company to know that they can trust their high level insights. [00:11:40] Speaker A: Yeah, Expanding upon like the structure into trust and governance. What other aspects of AI ready data do you think organizations often are underestimating? I mean even like you've been doing so much of this research, I mean, I think of a lot of people I interact with, I think the person is kind of really dive into. AI has been definitely a lot of the work you've done. You know, what, what are you think organizations are completely underestimating? [00:12:10] Speaker B: Well, to me, governance in AI ready data is the life cycle of the information. You know, one of the things that we now have is a lot of, a lot more historical data than in the past. And, but that historical data can kind of be understood to be of lesser weight because it can be incomplete and the newer data in the systems can be kind of more trustable. And you know, you may know of upcoming new tools that will be even richer so that your current data at some point will be of lesser weight because you're going to be able to add more value, add pieces to your system with the, that newer data that's coming up. So when you're looking at an intelligence layer from an architectural standpoint, you know, be sure to reserve tags in Snowflake about the governance of that weight of the level of importance of that data and how, you know, current and relative it may be. So that in Snowflake, when you tag that data in that intelligence layer, you know, it's that way to apply that those relationships between the metadata and that effective layer are, you know, going to be what you expect it to be on an ongoing basis and it can travel over time with that life cycle of the governance that you do. [00:13:43] Speaker A: Yeah, you know, I'm going to go on my historical rant, I guess you could say at the moment. You know, I started in data about 30 years ago and back then we weren't going at the pace we're going now. The data wasn't at the volume or the scale we are now. But what I, but back then when I was doing it, we put a lot of time and emphasis into like you were talking about the governance. You know, we don't, we didn't have tags the way we have it now. But you know, even when I used to data model the data and, and how it would structure it, how it was relative to business context, you know, we put a lot of that part of the delivery process, you know, we had a lot more time than we have now. So we put a lot of this in and I saw a lot of that, I'm gonna call it fundamentals of data management kind of erode over the years. You know, we got into, you know, the Hadoop, right, the schema on Reed, the all this other stuff and I found a lot of that. Oh, we don't need to do that anymore. Right. We're just going to put the data in, you know, do some queries, wrangle the data and outcome, some results. Now that we're kind of moving into AI. It's interesting because you were just talking about governance and policies and tags and it's kind of like reverting back to like, hey, we got to get back to these fundamentals because just doing a schema on read and thinking, you're going to throw that into AI and it's going to understand it. So to me it's just, you know, a bit, I don't know if it's comical or, or if it's kind of like part of me wants to say I told you so, but you know, I won't do that. But you know what's you kind of like your take on that. Do you feel like in a similar way like some of these old practices are kind of rearing their, their faces again? [00:15:41] Speaker B: Well, I think that they're rearing their faces again. Anytime you're going to introduce sort of a new layer of architecture, like when this semantic layer came around, you know, you want to kind of land that reporting data on top of your data mart. So in taking a look at not dismissing that sort of historical data architecture, you know, that Kimball and everybody has done, but you know, just kind of time to take a look at what's the next new. So on top of that semantic layer, if there's that intelligence layer that's governed and that data ops helps us to be able to sustain and the compliance of it is kept up so that that trustworthy about AI is there because you know your data. So you know, DataOps really gives us that ability to help manage that data movement so that we can focus on evaluating and acting upon insights. [00:16:40] Speaker A: Yeah, you know, it's, it's, you know, you were just talking about DataOps there and I've seen a lot of organizations will say they have data Ops in some formation, you know, but I've definitely seen now that data ops maturity change in organizations ability to sustain that capability for being AI ready data over time. Like what kind of is your thought on that maturity aspect of DataOps and their practices? [00:17:13] Speaker B: Well, like I said, one of the things about the roadmap ahead is part of that learning journey that you go on is that you go out and take a look at the tools that are available too. So, you know, look up AI readiness, data tools. Go look up, you know, what that playing field is, because some of those tools are going to be what takes us there and helps us get to more enriched, you know, trustworthy sets of data. That DataOps is one of the things that I think every data engineer should also put into their skillset going forward. [00:17:54] Speaker A: So, yeah, it's, it's one of those ones. I'm going to go back, you know, about six years and when the teams I was on, we used to just call it, you know, common services, we didn't really. The term Data Ops. Right. You'll go back like six, seven, you wasn't as prominent as it is now. But I completely agree with you. That has to become something that every organization is understanding the practice of Data Ops, you know, and at Data Ops Live, our whole thing is, is beyond just the data ops as a practice. It's, it's automating. Right. Again, from. You could take all these things, but, you know, many organizations, we used to do like one deploy a week and we thought we were like, wow, you know, with the end of the sprint, we're going to do a deploy. I think with AI, if you're not deploying things daily, you know, or even sooner than that, hourly, you know, you can't wait a week if something is awry or got to be broken and fixed. [00:18:54] Speaker B: Yeah. Companies spend more time in their staff at what it takes to do deployments and things like that than they realize. So if they just take a look at that cost and, you know, resources, availability of information and do something like, you know, Snowflake is subsidizing, you know, 500 free minutes a month for Data Ops Live. They're that dynamic delivery tool so that it's basically free. So there isn't a reason why anybody shouldn't hop in and be utilizing that as part of their standard Snowflake practices. [00:19:30] Speaker A: So, yeah, it's always good to hear others, you know, mention that. You know, I, I often hear, and you may have said like, when talking like the data scientists or even analysts, and they start saying, yeah, I spend like 80 of my time cleansing data or wrangling data like it's the norm. And I'm like, there is far better things you should be doing with 80% of your time that's not really bringing a lot of business value. And the fact that that's become a norm and an okay thing, just like, what, how can, how can that be, you know, you know, I would rather be spending 80% of my time finding value in, in that data. So before I kind of, we come to an end with the show, you know, you, I'm going to kind of give you an opportunity to kind of like you've been on this journey, you've been dying, diving into it. If you had some parting words for architects and engineers or maybe who haven't even started that journey yet, maybe they're afraid or think they're over the head. What kind of words of wisdom do you have for, for folks out there looking to maybe go into 2026 with, you know, being into the AI space. [00:20:44] Speaker B: When you don't know what first step to take on your AI journey? Go out to my AI LinkedIn articles under my AI, my AI learning journey. Because one of the things I'm doing is carving that path for others to come follow because I didn't know what first steps to take either. And I think that businesses, you know, they don't either. So, you know, a couple of the articles that I've recently read, the one on AI ready data, first steps. Well actually, and then there's also one called AI and the Intelligence Journey. The Intelligence Journey, you know, they both kind of explain those steps, kind of for businesses to take and that AI ready data talks about go out and know what your AI objectives and goals are for your organization first. Then go get your data ready to support that. I think people, when they even start out their journey, my first steps, I didn't know in really what direction it was going to take me, but I could tell after a first few points that years of experience are good at. And if I take that forward for folks, that's really that underlying substrate that everybody's going to need. If the data isn't good and you don't know your goals, then you're not going to arrive at that trustable solution that's available to you. [00:22:16] Speaker A: I agree it's always fun to want to go play with the new shiny toy, but really building that data foundation I think is really key why we're really pushing for that AI ready data. Well, Angela, this was a fantastic conversation. Glad we could have got some of our off camera conversations onto camera so people could hear it. I appreciate you joining me today. [00:22:42] Speaker B: You bet. So it's good to talk and thank. [00:22:44] Speaker A: You for sharing your journey and helping push the industry toward more intentional, trustworthy AI data. Until next time, this is Keith Belanger. Remember, good enough data is not AI ready data. Thanks everybody. [00:22:59] Speaker B: Thanks.

Other Episodes