Podcast

#92 – Data insights from 1.9 million food logs (Ben Grynol & Helena Belloff)

Episode introduction

Show Notes

Humans create an amazing amount of data each day. At Levels, we’ve already collected over 100 million health-related data points from our member logs. The question is, what do we do with all this data? Levels Head of Growth Ben Grynol sat down with Levels data scientist Helena Belloff to talk about how we can make that data useful for our members, how personalized insights can be so powerful, and the connections that can be made between those data points and health.

Key Takeaways

08:52 – General advice is helpful, but personalized is better

Helena has run research projects where members are given general or personalized insights and feedback based on their logs. Personalization is always seen as more helpful.

It’s helpful to know, here’s in general how you can avoid feeling fatigued and irritable at 2:00 PM, 3:00 PM at your desk. Here’s just some general things you could try out to begin with. Maybe here’s a little short article about why these things work. Then as you log more food and you learn more about how you respond to certain things, we can come in and say, “Hey, this part of that general advice is something that works really, really well for you. Here’s something else that you might want to try because we noticed in your data that you’re not doing very well with chickpea pasta,” or something like that, “and other people similar to you have had more luck with these alternatives. Try some kelp noodles,” or something. So yeah. There’s a massive opportunity to shift more towards personalized healthcare because at the end of the day everyone is different. We can’t just throw a dart at the board and hope that it hits something that’s meaningful for everyone. That’s just never going to happen.

13:05 – Make small behavior changes

A balanced diet is called that for a reason. Giving people options for small, healthier changes they can make instead of giving up certain food altogether is the best approach.

I think if you’re someone who’s logging pasta, let’s say down the road we put an insight in the app where it’s like okay, now you’ve logged pasta a bunch of times. We can model how you’re going to react and project out what’s going to happen if you eat pasta again. I’m still going to eat the pasta because I’m logging it. It’s in front of me, I’m going to eat it. But maybe there’s something I can do to mitigate the level of spike that is going to result from that. Maybe I take a walk right after or maybe I pair it with protein or maybe next time we have the capability to tell someone this is how you’re going to feel after you eat this. Maybe I still eat it but maybe next time I decide to make a different choice. It’s all about small behavior changes. I don’t think that we’re going to overhaul the food industry. I think we’re going to change the way people are living and how they approach balance, I guess. If we’ve learned anything it’s that a balanced diet is a real thing and it works.

17:31 – Not spiking doesn’t mean it’s healthy

Pizza doesn’t cause a spike for many people because of the fat and protein from the cheese. That doesn’t you should eat it every day. It’s important to understand why things spike or don’t.

I think pizza was one of the first things I was shocked by, where I was like wow, I don’t spike at all with pizza, it must be healthy. But I know that it’s not. I think the same thing could be said for alcohol, for example. That’s one that I think people sort of see as they go through Levels and as they log it, but if you have a glass of wine before a meal it might inhibit the level of spike that you’re getting. And that doesn’t necessarily mean that you should have alcohol with every meal. But understanding why that happens and why that might confound your results and being informed and mindful about the meal that you’re eating with that can really be very, very powerful and very helpful for a lot of people. Once you start to learn more about how your body responds to things, you don’t even need to look as often. You’ll know and you’ll be more aware of how you’re feeling after these things. It’s really education is the biggest thing here.

20:04 – Sleep and glucose levels are closely tied

Poor sleep causes glucose variability, but noticing patterns around behavior during the day and how it affects your sleep can help you to get on top of it.

I think specifically with sleep actually. I’ll look at people’s data and it’s like, oh, are you logging too close to bedtime? People who eat between the hours of midnight and 4:00 AM get, on average, less hours of sleep and have more glucose variability the next day. We’ve certainly established in our data that if you get less than six hours of sleep, you’ll on average have higher variability the next day. Or I think, from a research perspective, I’d like to get to a place where we can say okay, your sleep is being disrupted because you have things like chronic stress, or you’re drinking coffee too late in the day. You had coffee at 1PM and every time you do that you get less than this many hours of sleep. Or are you mostly sedentary throughout the day, like maybe you’re not getting enough exercise and that’s why your sleep is disrupted. Are you getting too much exercise?

22:51 – Context matters

The same food can affect different people’s glucose levels in different ways, but context is important in understanding why. Exercise and food pairings can have a big impact.

So another example, I could tell you peanut butter, which is one of our most logged ingredients, has an average zone score of seven, but there are people who get ones when they log peanut butter and there are people who get tens. So all this stuff really depends on context, on what you’re eating it with, are you moving around afterwards, are you going for a walk? I forget, someone was saying in terms of Thanksgiving for example, they were eating their Thanksgiving meal as they were entertaining and walking around the kitchen and they didn’t really spike. Then the next day when they ate the leftovers, the same exact meal, they were sitting on the couch watching football and their glucose was going crazy. So all of this stuff really, really does depend on what you’re doing and what’s going on with you specifically.

27:17 – Data scientists are technology translators

Helena sees data science as a translator role between technology and people. There is so much data out there and data scientists can discover how it can be most useful.

To give you an analogy, if you think about the job of a translator, they ingest information in one language, very quickly decide what parts to translate, because not all languages have linear translations, and then regurgitate the information in a way that the third party will understand. Data scientists are the translators between technology and people. And in a world where we consume and produce unimaginable amounts of data, it’s an incredible responsibility, and I love it. And the reason why is because big data and more specifically big data in healthcare often presents the most challenging puzzles because of the dynamic and vast nature of the data we’re dealing with. But the possible solutions offer the most rewarding outcomes. Like, with the ability to improve the lives of people all over the world, and that’s something that I’m deeply passionate about.

30:45 – Not all data is useful

We produce huge amounts of data each day and most of it isn’t useful. Companies have to determine what data matters the most for their customers and leave out the rest.

It’s things like clicks, where someone is clicking around on a webpage. That might not be relevant to everyone. So I think companies need to decide okay, what is going to be the most relevant to us, what’s going to help us achieve our goals and what’s going to be more useful for our members? And that’s something that I think a lot about when it comes to collecting data at Levels because there is such a thing as too much data but at the same time we want to be really mindful of what data we collect, how we’re going to use it, how we’re going to store it. Like you said, privacy is also something I think a lot about.

34:55 – Balance the amount of data collection and insights

People should have a choice about how much data they’re willing to provide and companies should provide insights even with the minimum.

I think there’s a healthy balance between asking people for data and turning that into insights. I mean, if you’re a Levels member and you’re taking the time to log every single ingredient in your salad, you should be rewarded in some way for that. You should get as much insight as possible from that. If you’re someone that maybe you don’t like to log or you just forget sometimes, you should still be able to get value out of using Levels. So I think that there’s a balance that needs to be achieved. We need to be able to give people insights and use whatever data they’re willing to give us while obviously being mindful of data security and privacy and things like that.

39:27 – Tagging cleans up the data

A databank becomes a lot more effective and useful when the data is standardized and easy to draw upon. Levels has started using tags to clean up the log data from members.

So a lot of cleaning right now goes in on the backend to actually pull and say okay, yeah, our members log peanut butter and that has an average zone score of seven. I have to go through and clean every single log to get that insight so that it’s all standardized and reads as peanut butter and not like PB or peanutbutter one word or just peanut or something like that. So what tagging is going to do is sort of standardize all of the text on the backend. I’m so excited because we’ll be able to do things like link it on the backend where like if you log mac and cheese, any model I implement will know that that’s pasta. Or if you log sourdough versus white bread, we’ll know that both of those are breads but they’re different types of bread and you’re going to respond differently to different types of bread. Or we’ll know vegan hamburger or something like that. It’s in the hamburger category but it has a very different composition from regular hamburgers.

43:41 – Connect glucose with stress

All of the continuous biometric data that Levels is collecting will help to connect biomarkers with behavior like taking steps to reduce stress.

The other thing I’m super excited about is all of the continuous biometric data that we have and something else that we can do with our data and that we will do is propel research forward. There isn’t a ton of research on glucose in nondiabetics, and something Taylor, our head of research, thinks a lot about, is can we connect these continuous biomarkers to behavior? If I’m feeling stressed, what’s happening in my body and can we quantify it? In other words, can we say if your glucose curve looks like this, you’re probably feeling stressed or maybe you’re feeling angry. Lots of members have told us that they see glucose spikes in response to emotional stress, and if this happens over and over maybe you’re someone that experiences chronic stress and here are all of the implications of that and interventions to help you make behavioral and biological changes.

Episode Transcript

Transcript

Helena Belloff (00:06):

So I think that there’s a balance that needs to be achieved. We need to be able to give people insights and use whatever data they’re willing to give us, while obviously being mindful of data security and privacy and things like that. And so we want to avoid a situation where keeping track of someone’s data and they have no idea how we’re using it or how it’s being stored or who’s looking at it.

Ben Grynol (00:45):

I’m Ben Grynol, part of the early startup team here at Levels. We’re building tech that helps people understand their metabolic health, and this is your front row seat to everything we do. This is A Whole New Level.

Ben Grynol (01:11):

Any time you start to track data, it compounds pretty quickly. Pair that with exponential growth, that being the amount of data that is coming in, well you get a surplus of data pretty quickly. At this point, Levels has one of the largest sets of data pertaining to metabolic health in the world. That’s glucose data outside of the type 1 and type 2 diabetic community in the world. There’s so much to learn from the way that people react, the way that they metabolize certain foods. There are all these different differences, genetics play a part, sleep plays a part, things like cortisol, that plays a part in metabolic health too.

Ben Grynol (01:51):

There are a number of other factors, but when you start to think about what does all of this data actually mean, what can we do with it? Well, that’s where things get really interesting. We’re still very early in our journey with over 100 million health related data points collected, and as we start to scale, as more members come on board, as we build out Levels and launch internationally, well that dataset is going to compound pretty quickly. So the question is, what do you do with all this data?

Ben Grynol (02:17):

So Helena Belloff, who leads out data science efforts internally, she and I sat down and discussed some of the implications around data collection. What can we actually do with it when we start running regressions against it? We can always find correlations between X and Y and we can extrapolate that many different ways. But the interesting thing is sometimes there is consistency in the dataset. Here’s where we kick things off.

Ben Grynol (02:48):

So, we have a ton of data and the data keeps compounding. We keep getting more and more and more. And some of the things that we track are things like food logs, things like health data points and things like glucose data points. So the question becomes, what do we do with all this data? Like now we have all this data. What are things that we can do with it, what are things that you’re seeing in the dataset, and how can we think about using this data to help everybody that is a member and just help people through things like education?

Helena Belloff (03:25):

Yeah, I mean I think we have the largest non-diabetic glucose dataset in the world. Which is pretty insane. We collectively have around 16.5 million hours of glucose data. And we have a unique opportunity because we have enough context about any individual member to give you recommendations based on your data.

Helena Belloff (03:56):

So I’m doing a member research project right now where I’m asking a handful of members to allow me to go through their data and send them insights throughout the week as they log food and scan their sensor. And I was talking with a member the other day about his afternoon meals because he tends to spike after launch, which is not uncommon. And I’m experimenting with what types of insights are going to be the most useful for our members. And I gave him sort of two options.

Helena Belloff (04:29):

The first insight I gave him was okay, you spike after launch. Here’s some general advice for how to avoid that afternoon slump, which is choose meals that are less carb heavy and do some exercise or go for a walk after lunch to minimize that big spike and subsequent drop in blood sugar you’re experiencing.

Helena Belloff (04:53):

The second insight I gave him was for you, specifically, you should go for a walk after lunch and you may want to switch up your carbs. Because I saw in his data that when he went for a walk after his afternoon meal, his glucose returned to normal range much faster and I was able to give him examples of meals he ate and then where he went for a walk where this actually made a difference. And I was able to quantify the exact difference in how his blood glucose responded when he went for a walk, versus a similar meal where he didn’t go for a walk.

Helena Belloff (05:27):

And I noticed that certain carbs that other people tend to do well with, like bulgur for example, which is less processed than most grains and contains more fibers and nutrients, were not working for him. And was able to say, “Okay, try switching up your carbs. Try other sources of fiber with breakfast, like beans or lentils or avocado.”

Helena Belloff (05:51):

And I asked him what’s your feedback, what’s most useful here? And he obviously picked the ultra personalized insights I provided over like this is in general what you can do. And he kept asking me wait like is this actually me, are these suggestions and insights actually specific to my body and my data and I told him, “Yeah. This is you and this is how you compared to other people your age and here’s what other similar people are eating.” He was just blown away by how much we can do with this data. And I learned a lot too, certainly, about what we have and how many different directions we can steer this in.

Ben Grynol (06:35):

Yeah. It’s interesting because people want to know how they compare to the mean. So looking at… The mean being the community. How do other people fair with sweet potato? Whether or not a person can… And I’m using that as an example. But whether or not a person can or cannot metabolize sweet potato well regardless of the way that it’s paired with fat, fiber, protein. They just want to know like in general. And a lot of this is qualitative from some of the community calls that we’ve done. But they just want to know, am I an outlier? Because I think that’s human nature, it’s like we want to know am I middle of the bell curve? Am I in the long tail?

Ben Grynol (07:17):

So it’s this thought around large datasets allow us to provide these insights and surface them, but then you can get… you go from the macro to the micro and you can deconstruct everything down to these little principles for this individual that you did where you’re like, “Eh, bulgur’s not really working for you personally. Try this instead.” Then that reverts back to the macro of the dataset.

Ben Grynol (07:42):

On average, people are finding that… or from what we see, on average there is glucose responsive, whatever, N, right? That helps people drastically to understand how they can make these changes. Then what they need is they need to see that data a few times over and over to go, “Oh, I get it. The percentage delta between walking and not walking is this. I guess I should keep doing this.” So it’s creating the feedback loops for people to see it, to feel it, to understand it, and then actually see a meaningful difference.

Helena Belloff (08:16):

Yeah. Then in terms of education, I mean this member’s pretty… he reads our blog and listens to these podcasts pretty religiously. So he’s very much aware of what we’re talking about health wise and all the articles we post. So he was less interested in the general advice but I do think for a lot of people who are coming in who have no idea what metabolic health is, I mean I certainly knew nothing about nutrition or any of that when I came into this job and I’ve worked in healthcare my entire career. It’s helpful to know like okay, here’s in general how you can avoid feeling fatigued and irritable at 2:00 PM, 3:00 PM at your desk. Here’s just some general things you could try out to begin with. Maybe here’s a little short article about why these things work.

Helena Belloff (09:14):

Then as you log more food and you learn more about how you respond to certain things, we can come in and say, “Hey, this part of that general advice is something that works really, really well for you. Here’s something else that you might want to try because we noticed in your data that you’re not doing very well with chickpea pasta,” or something like that, “and other people similar to you have had more luck with these alternatives. Try some kelp noodles,” or something.

Helena Belloff (09:48):

So yeah. There’s a massive opportunity to shift more towards personalized healthcare because at the end of the day everyone is different. We can’t just throw a dart at the board and hope that it hits something that’s meaningful for everyone. That’s just never going to happen.

Ben Grynol (10:14):

So when it comes to different types of food and let’s categorize this noun food as things that is actually real food, not highly processed food like Skittles. We’ll get into that, like we don’t want to dunk on Skittles but let’s dunk on them, why not. That’s not food. But you can extrapolate certain things for it, so that the counter-example to what you’re saying is everyone is different, we have to look at what insights, like what is going to work, when we talk about real food what’s going to work for different people. But we can extrapolate certain things forward and you’ve looked into the dataset to see what doesn’t work for people. And you don’t need a dataset of 1.9 million food logs to start to draw some of these insights. Anecdotally, people know Skittles aren’t good for you. Anecdotally, people know don’t go crushing Big Macs all day. Those are things that are not going to give you a good metabolic response.

Ben Grynol (11:14):

But when you start to have data, and you’ve looked into the dataset, there are things that you see and you’re like okay, on average the mean is this for these types of foods. So there’s sort of two sides to it. Like, what are some of the things that you’ve seen where you’re just like wow, I knew to avoid that but people should really avoid that. Then what are some of the things that you’ve seen that were more eyeopening that seemed like they would be okay but they give, on average, higher metabolic responses?

Helena Belloff (11:45):

Yeah, I think one of the big things for me, because I eat a lot of it, was sushi. It spikes a lot of people. And every time I eat it, my glucose is crazy.

Ben Grynol (11:59):

What’s that number look like as far as the average?

Helena Belloff (12:03):

I believe the average zone score is somewhere in like the eight, nine… Sorry, not the eight, nine, like the five, four/five for sushi. But when you compare it with sashimi that doesn’t have rice, it’s like eight, nines.

Helena Belloff (12:24):

Then what a lot of people don’t realize, and I didn’t even know this until I started working here, is that soy sauce can secretly have a lot of sugar in it. And I’ve tried just simple things because I realized that I spike with sushi, is okay, let me opt for sashimi and let me limit the amount of soy sauce I use with it. And it’s actually worked. And that was my crazy ah-hah moment. Wow, like I can still eat these things but I can make these little tweaks and I end up feeling a lot better, like I don’t get that sluggish oh I’m so full feeling after I eat sushi now.

Helena Belloff (13:05):

I think if you’re someone who’s logging pasta, let’s say down the road we put an insight in the app where it’s like okay, now you’ve logged pasta a bunch of times. We can model how you’re going to react and project out what’s going to happen if you eat pasta again. I’m still going to eat the pasta because I’m logging it. It’s in front of me, I’m going to eat it. But maybe there’s something I can do to mitigate the level of spike that is going to result from that. Maybe I take a walk right after or maybe I pair it with protein or maybe next time we have the capability to tell someone this is how you’re going to feel after you eat this. Maybe I still eat it but maybe next time I decide to make a different choice. Like it’s all about small behavior changes. I don’t think that we’re going to overhaul the food industry and I think we’re going to change the way people are living and how they approach balance, I guess. If we’ve learned anything it’s that a balanced diet is a real thing and it works.

Ben Grynol (14:20):

Absolutely. What are some of the things that you’ve seen that weren’t necessarily a surprise but you know when you look at the dataset as far as like the food… and I don’t like to call it food but things like Skittles, what are some of the things in those categories as far as like the worst foods for somebody to consume, like the bottom, bottom, bottom. What are those, off the top of your head? I know we talked about it before, it’s things like Skittles, things like Big Macs, things like egg McMuffins that just don’t give people nice, flat glycemic response, if you want to call it that.

Helena Belloff (14:59):

Yeah. I think Big Macs are actually a pretty big one. One interesting thing I noticed and I think mentioned this on Friday Forum was chicken McNuggets. If people who eat them and don’t log any sort of honey mustard or sweet and sour sauce, fair substantially better than people who log that with the honey mustard and the sweet and sour sauce, and that’s because those are basically all sugar. But it was such a drastic difference, it’s incredible. I don’t know the actual difference off the top of my head but I remember being actually shocked by how much just a little packet of sauce and what a difference that could make in your body’s response.

Ben Grynol (15:46):

Mm-hmm (affirmative). Have you seen anything along the lines of… So there’s certain food that are woven into the fabric of society. That being, let’s use things like pizza. Pizza is something that a lot of people enjoy, and the idea with everything that we do as a company is to give people insight into the way that their lifestyle choices can impact their metabolic health. It’s not to be prescriptive and say, “Never eat pizza.” As an example, it is, “Make sure that you approach things with balance, make sure that you understand when you do make certain choices that here are the implications of that.” Pizza is a good example of something that is probably not in anyone’s best interest to eat at lunch if they are going about their work day because the response is not going to help them get through a productive work day. So you can give them those insights.

Ben Grynol (16:38):

But have you seen things like pizza and salad, pizza and certain fat or protein, as far as the way people log it where you’re like wow, that’s a game changer as far as having that lens, and being able to surface that insight?

Helena Belloff (16:53):

Well, the funny thing about pizza actually is that a lot of people don’t spike with pizza because there’s cheese, there’s fat on it. Which I thought was super interesting. But again, like you said, it’s context, it’s education. It’s probably not in your best interest to eat a bunch of bread with sauce and cheese on it. But I think just having that information and a huge part of what we are trying to do is push research and push education forward because a lot of people just don’t know this stuff. I think pizza was one of the first things I was shocked by, where I was like wow, I don’t spike at all with pizza, it must be healthy. But I know that it’s not.

Helena Belloff (17:42):

I think the same thing could be said for alcohol, for example. That’s one that I think people sort of see as they go through Levels and as they log it, but if you have a glass of wine before a meal it might inhibit the level of spike that you’re getting. And that doesn’t necessarily mean that you should have alcohol with every meal. But understanding why that happens and why that might confound your results and being informed and mindful about the meal that you’re eating with that can really be very, very powerful and very helpful for a lot of people. Once you start to learn more about how your body responds to things, you don’t even need to look as awesome. You’ll know and you’ll be more aware of how you’re feeling after these things. It’s really education is the biggest thing here.

Ben Grynol (18:42):

Have you looked into anything pertaining to alcohol? And this gets a little bit hard because as a data scientist, which we have to get into that as a term as it is, but when you start to look at statistics, it’s like anyone who studies statistics knows correlation does not equal causation and you can tell yourself that over and over and over again, but there is some sense of insight that can be taken from things like consuming alcohol. So have you looked at anything with a dataset where… and this is back to correlation does not mean causation, but when people consume alcohol we know that they will have disrupted sleep, from things like WHOOP. You looked into or you were part of the WHOOP in Levels team case studies where we did it together. And there was drastic differences as far as sleep quality when consuming alcohol and all these things.

Ben Grynol (19:37):

We know that poor sleep quality equals, on average, elevated glucose levels, even if there isn’t high variability, your average glucose levels throughout a day can be elevated if you had poor sleep quality. So are there things that you’ve seen with the dataset so far that you can, again, anecdotally start to look at some of the things where you go when people are consuming alcohol, this is sort of what’s happening over time?

Helena Belloff (20:04):

Yeah. And I think specifically with sleep actually. I’ll look at people’s data and it’s like, oh, are you logging too close to bedtime? People who eat between the hours of midnight and 4:00 AM get, on average, less hours of sleep and have more glucose variability the next day. We’ve certainly established in our data that if you get less than six hours of sleep, you’ll on average have higher variability the next day. Or I think, from a research perspective, I’d like to get to a place where we can say okay, your sleep is being disrupted because you have things like chronic stress, or you’re drinking coffee too late in the day. Like, you had coffee at 1PM and every time you do that you get less than this many hours of sleep. Or are you mostly sedentary throughout the day, like maybe you’re not getting enough exercise and that’s why your sleep is disrupted. Are you getting too much exercise?

Helena Belloff (21:14):

I think anecdotally, so far, we’ve seen a lot of these things. And I’ve even seen with my own data, whenever I get minimal hours of sleep or poor sleep because my dog wakes me up at an ungodly hour, my glucose is all over the place, regardless of what I eat. So yeah, I’ve seen a ton of stuff.

Ben Grynol (21:38):

It gets challenging because you almost start off in a deficit in some cases, where you have poor sleep quality, you have higher variability, maybe your average glucose throughout that day is also at an elevated level. And then if you made a choice to eat something like, let’s say pizza or something, like sushi, right? Then your variability, again, it’s higher. Your recovery and your sedentary, we keep compounding these things.

Ben Grynol (22:06):

It doesn’t take that many days of compounding all of these, what seem like micro factors, to start to go wow, it’s getting harder to get to a steady state or something that is a little bit more stable. So yeah, it becomes a challenge to connect all of these dots that are somewhat unrelated, where we’re sitting now. And that’s back to this idea of personalization where you’re able to look into specific data points for individuals and go… once we can do that at scale, but when you can go wow, this is really what I’m seeing here and X might actually equal Y in this case, right? That’s where doing deeper analysis from a personalized data perspective is really, really neat.

Helena Belloff (22:51):

Yeah. So another example, I could tell you peanut butter, which is one of our most logged ingredients, has an average zone score of seven, but there are people who get ones when they log peanut butter and there are people who gets tens. So all this stuff really depends on context, on what you’re eating it with, are you moving around afterwards, are you going for a walk?

Helena Belloff (23:20):

I forget, someone was saying in terms of Thanksgiving for example, they were eating their Thanksgiving meal as they were entertaining and walking around the kitchen and they didn’t really spike. Then the next day when they ate the leftovers, the same exact meal, they were sitting on the couch watching football and their glucose was going crazy. So all of this stuff really, really does depend on what you’re doing and what’s going on with you specifically.

Ben Grynol (23:54):

Yeah, it makes a difference, all of the factors make a difference. We have to touch on this one thing, which we probably should have highlighted before, is this idea of being a data scientist, that is officially your role. That is what you do. But more importantly, what exactly is a data scientist, what does a data scientist do? It sounds like if a person hasn’t heard the term before, it just sounds really smart, and it is very smart, but it’s one of those things that seems so foreign sometimes that you’re like, I’ve heard about it, I have no idea what does this person do?

Helena Belloff (24:26):

Yeah. I love this question. Because to answer this question I think we first need to talk about data in general, because a career in data science doesn’t always follow a linear path. Just to give you some context, I used to read really, really technical textbooks for fun. And I know, to a lot of the people listening, that probably sounds like their worst nightmare. But I’ve always loved numbers and puzzles and I really wanted to expand my technical skills.

Helena Belloff (24:59):

But now, I’m at a place in my career where I’ve been exposed to many different types of data and technologies and I’ve started to view tech as more than a one dimensional equation. And so I’ve started to read about it in other contexts, like economic or political or social or health. And I’m reading this book right now called Blockchain Chicken Farm, which shout-out to Jeremy’s wife for the recommendation. It’s about the political and social entanglements of technology in rural China. Without getting too much into it, it’s very interesting, I recommend it. But there’s a quote in it that loosely reads, “Code is words made executable. So we must take care in what we say.”

Helena Belloff (25:45):

As a data scientist, never has a statement sounded more correct to me. Technological innovation is accelerating with rapid speed, and the world produces and consumes… I’m going to give you another statistic. I think it’s something like 94 zettabytes of data as of 2022. That’s an almost unimaginable amount of data. And just to give you an example of really just how much data is out there, I saw this really interesting graph recently that showed media usage in one minute on the internet, and it was like there’s 350,000 Instagram stories posted, 40 million WhatsApp messages sent, $250,000 sent by Venmo users, 500 people engaging on Reddit. And that all occurs within one minute on the internet.

Helena Belloff (26:43):

So to answer your question, what is a data scientist, with another question, how do we deal with massive, massive amounts of data in a way that is scalable, ethical and useful? Like I said earlier, we can’t just throw a dart at a board and hope that we hit a target that means something to us. We need to be systematic and we need to be smart about how we use data, how we store it, and how we present it. And that’s what a data scientist does.

Helena Belloff (27:17):

To give you an analogy, if you think about the job of a translator, they ingest information in one language, very quickly decide what parts to translate, because not all languages have linear translations, and then regurgitate the information in a way that the third party will understand. Data scientists are the translators between technology and people. And in a world where we consume and produce unimaginable amounts of data, it’s an incredible responsibility, and I love it. And the reason why is because big data and more specifically big data in healthcare often presents the most challenging puzzles because of the dynamic and vast nature of the data we’re dealing with. But the possible solutions offer the most rewarding outcomes. Like, with the ability to improve the lives of people all over the world, and that’s something that I’m deeply passionate about.

Helena Belloff (28:17):

On the other hand, if you want to think about the quote unquote “one dimensional” definition of a data scientist, we’re analytical data experts who have technical skills to solve complex problems. Which is also even just a super fancy way of saying we’re a bunch of curious nerds.

Ben Grynol (28:36):

A very good way of framing it. So we’re collecting so much data and one thing that we’re very calculated with, if you want to call it that, is this lens on privacy and its importance. As a company, I mean it is inherent to our values. But what are some of the challenges with the data? Is it the sheer quantity that we’re collecting? Like, when you start to think about hey, we’ve got all of this data, we’re still in beta as of February 24th, 2022, we’re still in beta and the data is only going to keep increasing. And that’s given that we’re collecting glucose data points.

Ben Grynol (29:20):

But as we start to collect other data points hopefully in the future, that starts to become a really interesting thing, but it can also come with challenges. So what are some of the things you think about as far as it’s amazing to have access to this dataset, there’s technical challenges as far as warehousing the data, storing it, making sure everything maintains a lens on privacy but then there’s also, you’re given all these options. You can do whatever you want with the dataset. How do you think about some of these challenges that we might face when it comes to data?

Helena Belloff (29:54):

Yeah, I mean our data is already pretty nuanced. We have all this glucose data but we may not have logs to match it. Like, someone just might not log very often, and we won’t have that context for them. Or someone could log absolutely everything and we need to decide okay, do we want to pick apart absolutely every little ingredient in this log and use it all in some way? And that’s part of the job of a data scientist, is figuring out okay, what data do we store? Because a lot of the data on the internet and in the world is not usable, I would say most of it is not usable.

Ben Grynol (30:42):

Meaning why? Like why would it not be usable?

Helena Belloff (30:45):

It’s things like clicks, where someone is clicking around on a webpage. That might not be relevant to everyone. So I think companies need to decide okay, what is going to be the most relevant to us, what’s going to help us achieve our goals and what’s going to be more useful for our members? And that’s something that I think a lot about when it comes to collecting data at Levels because there is such a thing as too much data but at the same time we want to be really mindful of what data we collect, how we’re going to use it, how we’re going to store it. Like you said, privacy is also something I think a lot about. And yeah. I think all of this… I’m sorry, I don’t even remember the question. I just started ranting.

Ben Grynol (31:50):

No, no, no. It’s the idea around some of the challenges. But one of the things you touched on that’s interesting, there are a couple of things, so let’s go into a mini-digression around data usefulness, how it relates to Levels, and then get back on this train of thinking through challenges if we’re only collecting certain types of data.

Ben Grynol (32:11):

So one of the things is like, yeah, maybe not all data is useful. There is a way of capturing every car that drives by an office building or a house and you could notify a person every time. Like, hey, another car, another car. And it just becomes this thing. Or let’s say you decide because the volume is so high, we’ll give a notification for every 100 cars, whatever it is. It doesn’t really matter.

Ben Grynol (32:40):

We could essentially do the same thing. So back to what you were saying at the beginning of the episode, if we’re capturing glucose data points every 15 minutes and there’s a way of giving people a nudge. That’s a terrible product experience. Even if there is some insight, it’s like our job to figure out how to provide value through the insights. Like, we’ve got the data but we’re not going, “Hey, data, data.” Just pushing it out to people, because you could give some insight but it’s like the quality of the insight that matters, and that’s what you want to relate back to.

Ben Grynol (33:12):

Back to this idea of being able to do things with the data is right now, we’ve got roughly just around 5000 weekly active users representing anywhere from 45,000 to 50,000 food logs from those weekly active users. That’s only, we’ll say anywhere from 55, 56, to 58% of people are logging food. So the challenge becomes, we can see if there is a glucose spike. That’s a data point we’ve got. But then we don’t have something to correlate it against because you don’t know, was that a Wonder Bread sandwich or was that chickpeas? Very different types of food to consume.

Ben Grynol (33:59):

So then we can surface an insight that says, “Hey, you had a glucose spike of this and it lasted for this long.” But being able to pair it back, we can say, “Avoid the thing you ate.” But that’s not as helpful as being able to say like, “Oh the thing that you ate was Wonder Bread, definitely don’t do that. Think about something… like we know, homemade sourdough bread, wholegrain bread is going to give a different glycemic response than something like highly processed Wonder Bread.”

Ben Grynol (34:26):

So again, it’s like being able to do things with this requires the behavior of actually making the food logs so that we can start to think about how to surface those personalized insights. So there are all these different challenges in its two year point of what do we do when we’ve got a ton of data, how do we surface those insights, and then some of it relies on knowing that we need the right type of data to be able to help people.

Helena Belloff (34:55):

Right, yeah. And I think there’s a healthy balance between asking people for data and turning that into insights. I mean, if you’re a Levels member and you’re taking the time to log every single ingredient in your salad, you should be rewarded in some way for that. You should get as much insight as possible from that.

Helena Belloff (35:20):

If you’re someone that maybe you don’t like to log or you just forget sometimes, you should still be able to get value out of using Levels. So I think that there’s a balance that needs to be achieved. We need to be able to give people insights and use whatever data they’re willing to give us while obviously being mindful of data security and privacy and things like that.

Helena Belloff (35:56):

But if you do give us more context, we’ll be able to tell you a lot more. So it’s an interesting balance and I think it’s kind of a hot topic issue right now, because you have these big tech companies asking you okay, share your location with us, allow this app to track you wherever you are. Sometimes it can often be snuck in there.

Helena Belloff (36:27):

Like I for example was using this app where it was showing my exact location, like the city I was in, and never remembered giving that app permission to access my location. And if I did it was snuck in there somehow. And that’s pretty bad. So we want to avoid a situation where we’re keeping track of someone’s data and they have no idea how we’re using it or how it’s being stored or who’s looking at it. So I think that’s definitely something that’s going to come to the forefront as we get more data and as we ramp up things like research and stuff like that.

Ben Grynol (37:08):

So of all the data that you’ve looked at, you’ve looked at a ton of it now. What is your favorite thing that you’ve seen, given the size, the scale, the number of different points of the dataset. What’s the thing that you looked at and you thought that is interesting?

Helena Belloff (37:29):

Oh, that’s a really difficult question because I feel like I’ve seen a lot of really interesting things.

Ben Grynol (37:35):

Pick top three or a few.

Helena Belloff (37:37):

Top three… Well, one interesting thing I’ve seen… Okay, well I’ll say this. One thing I’m really excited about is tagging, because what tagging is going to do for data science at Levels, I’m so excited.

Ben Grynol (37:58):

Let’s frame tagging, just so that everyone listening has a lens on it. So tagging is, you have a Caesar salad and you don’t just log Caesar salad, you log all of the different inputs. So whether or not you have bacon on top, whether or not you have chicken, whether or not you have croutons, and and and. But breaking it down into inputs is the idea of tagging.

Helena Belloff (38:21):

Exactly. And through tagging we’ll be able to establish things like map specific things to categories. Because we have freeform logs, that’s one of the challenges I think of providing these insights, is that the data’s really unclean and there are many, many different ways of cleaning text. NLP is a vast area, natural language processing, of how humans interact with computers and how computers understand text and speech and all these things.

Helena Belloff (39:01):

So the freeform logs are great because you could literally put in whatever you want, but it makes my job really hard because I don’t care about words like lettuce and tomato. I just care lettuce, tomato, that’s all I want. And I want it all lowercase with no punctuation. I don’t care if it’s tomatoes, I just care if it says tomato.

Helena Belloff (39:27):

So a lot of cleaning right now goes in on the backend to actually pull and say okay, yeah, our members log peanut butter and that has an average zone score of seven. I have to go through and clean every single log to get that insight so that it’s all standardized and reads as peanut butter and not like PB or peanutbutter one word or just peanut or something like that.

Helena Belloff (39:57):

So what tagging is going to do is sort of standardize all of the text on the backend. I’m so excited because we’ll be able to do things like link it on the backend where like if you log mac and cheese, any model I implement will know that that’s pasta. Or if you log sourdough versus white bread, we’ll know that both of those are breads but they’re different types of bread and you’re going to respond differently to different types of bread. Or we’ll know vegan hamburger or something like that. It’s in the hamburger category but it has a very different composition from regular hamburgers. Or bacon is a really good example of that as well. So many possibilities with that, and oh my god, I’m so excited.

Ben Grynol (40:51):

Cleaning data is something that is equal parts art and science. I apologize, I’ve probably thrown an emoji or two into the dataset for logging, which is-

Helena Belloff (41:03):

Oh it’s so funny.

Ben Grynol (41:05):

But the funny thing is when you start to get into some of these things with nuanced logging and you’re looking through a dataset and as you’re cleaning it, you start to be able to even extract insights around things like brand. So you see how coffee performs versus bulletproof coffee, or you start to see Wonder Bread versus just bread or sourdough bread or whatever it might be. Sometimes we see that, that people will log… I think it’s probably more prominent with brands that have some notoriety, so people might log LMNT, the electrolyte drink, they might log that, but they’re not going to log Wonder Bread. Let’s say that. Because Wonder Bread isn’t… I’m making this up, but Wonder Bread’s not a brand that people anchor on and they aspire to have Wonder Bread, this Wonder Bread experience. It’s a lot different with things that people look up to as brands, and I think that’s one of the things that we see.

Ben Grynol (42:06):

When people look up to certain brands, like Athletic Greens, you’ll see logs for that because people feel that that’s a part of their identity. It’s interesting how that comes about though in logging. Like the way that people think about the food that they consume and then the way that they log it, versus something that’s like you could essentially, for Athletic Greens, just log shake or green drink or something. Right? There’s actually a need to log it a certain way but it’s very helpful when we start to see LMNT, Athletic Greens, whatever it is. Because you go wow, I can actually do something with this data.

Helena Belloff (42:40):

Exactly. And brands are a big one. I mean, we have the classic example, and I feel like Casey has mentioned this in some podcasts, RXBARs, KIND bars, Quest bars. All of these name brand bars. And then there’ll be situations though where people are just like bar. And I don’t know what that is.

Ben Grynol (43:04):

Snickers or a Mars? It could be anything.

Helena Belloff (43:06):

It literally could be anything. So I love that we are thinking about how do we get more context out of these very, very messy logs and sort of standardized things and streamlined things and make it easier so that when I do implement some sort of model, I make personalized smart… I’m calling them quote unquote “smart” recommendations.

Ben Grynol (43:36):

Oh you can call them smart. They’re very smart.

Helena Belloff (43:41):

For people, it will be a lot more scalable for us. That’s one of the things right now that I’m just like… I cannot express how excited I am for the tagging project and what’s going on with that. The other thing I’m super excited about is all of the continuous biometric data that we have and something else that we can do with our data and that we will do is propel research forward. Like, there isn’t a ton of research on glucose in nondiabetics, and something Taylor, our head of research, thinks a lot about, is can we connect these continuous biomarkers to behavior? If I’m feeling stressed, what’s happening in my body and can we quantify it? In other words, can we say if your glucose curve looks like this, you’re probably feeling stressed or maybe you’re feeling angry. Lots of members have told us that they see glucose spikes in response to emotional stress, and if this happens over and over maybe you’re someone that experiences chronic stress and here are all of the implications of that and interventions to help you make behavioral and biological changes.

Helena Belloff (45:04):

Research is something I’m coming straight out of research, I was at Mount Sinai before this doing Alzheimer’s research, and these are things that really, really excite me about our data because I was even listening, actually to a Levels podcast where Casey said something like all health and disease comes from either cellular function or cellular dysfunction, and cells need specific things to function properly and also the avoidance of other things.

Helena Belloff (45:40):

The food that we eat has tons of chemicals in it. And some of those chemicals are things that our cells need to function properly, but some of it aren’t. There’s so many chemicals in food we don’t even know, and Casey also mentioned this. But as we ramp up research in this space, it would be so interesting to see if foods that contain specific chemicals are implicated in biological pathways involved in metabolic dysfunction, for example. And that’s so exciting to me, and can we link a glucose curve to a behavior or an event that happens? So cool.