🍯 Sweet but Healthy? Evaluating Honey vs. Sugar in Recipes¶

Group Members: Dylan Dsouza, Jared Wang, Kyle Zhao, and Christian Kumagai

This notebook investigates whether honey-based desserts are meaningfully healthier than sugar-based desserts in real-world recipes. Using recipe-level data from Food.com's publicly available dataset of over 230,000 recipes, the "health premium" of honey is quantified by comparing the caloric content, sugar (% Daily Value), and total fat (% Daily Value) of desserts that use honey as their primary sweetener versus those that use refined sugar.


Abstract¶

Honey is often perceived as a healthier alternative to refined sugar, particularly in desserts where it is commonly presented as a more natural sweetener. This project examines whether desserts that use honey as the primary sweetener differ from those that use sugar in terms of calories, sugar (% Daily Value), and total fat (% Daily Value).

Using the Food.com recipe dataset, we filtered over 230,000 recipes to retain only those tagged as desserts and containing either honey or sugar as the sole primary sweetener. After cleaning the ingredient and nutrition data and removing caloric outliers, the final dataset included 30,986 recipes, of which 750 were honey-based and 30,236 were sugar-based. We used exploratory data analysis to compare group distributions and summary statistics, followed by Welch’s t-tests and permutation tests to evaluate differences in nutritional outcomes.

Across all three measures, honey-based desserts showed lower median values than sugar-based desserts. Statistical testing found that sugar-based desserts had significantly higher sugar content and also higher average calories and total fat, which was contrary to our original hypothesis. Further analysis comparing theoretical sweetener substitution with the observed differences in the dataset suggests that these nutritional differences are not explained by the sweetener alone. Instead, the results indicate that honey and sugar tend to appear in different types of desserts, so sweetener choice appears to reflect broader recipe composition rather than directly determine nutritional quality.


Research Question¶

Do dessert recipes that use honey as the primary sweetener differ in caloric content, sugar content, and total fat (normalized to FDA daily values) compared to desserts that use sugar as the primary sweetener?

As a control, we will filter recipes using the “desserts” tag and classified as honey-based or sugar-based using ingredient lists, restricting to recipes that contain only one primary sweetener. Then, we will define nutritional outcomes, specifically by total calories, sugar (% daily value), and total fat (% daily value) extracted from the nutrition column. As a method of comparison, we intend to use independent samples t-tests to compare mean nutritional values between honey and sugar desserts, before conducting downstream secondary analyses.


Background and Prior Work¶

Desserts are often viewed as indulgent foods, yet many consumers attempt to make them "healthier" by replacing refined sugar with alternative sweeteners such as honey. Honey is frequently marketed and perceived as more natural or wholesome, and this perception commonly appears in online dessert recipes and food blogs. However, research on the health halo effect suggests that when foods are framed as natural or organic, consumers tend to perceive them as healthier and sometimes even lower in calories despite having similar objective nutritional content. This indicates that perceived healthfulness does not necessarily correspond to measurable nutritional differences, motivating an empirical evaluation of whether honey based desserts are actually nutritionally distinct from sugar based desserts.

From a nutritional science perspective, controlled studies comparing caloric sweeteners have often found that honey and refined sugars produce similar short term metabolic responses when consumed at equivalent carbohydrate levels. These findings suggest that simply substituting one caloric sweetener for another does not inherently improve nutritional outcomes unless total quantities or accompanying ingredients change. Although honey differs in composition from granulated sugar, both function primarily as carbohydrate sources, and ingredient level differences do not automatically translate into meaningful differences in full recipes.

Large scale computational research has demonstrated that online recipe datasets can be systematically analyzed to investigate ingredient and nutrition patterns. In particular, Majumder et al. (2019) introduced and analyzed a processed Food.com dataset containing approximately 180,000 recipes and 700,000 user interactions, showing that the dataset supports reproducible computational modeling and structured filtering.1 Although their work focused on personalized recipe generation rather than nutritional comparison, it validated Food.com as a robust and well structured resource suitable for large scale quantitative analysis.

To compare nutritional outcomes in a standardized and interpretable way, we rely on federal nutrition labeling guidelines. The U.S. Food and Drug Administration defines Daily Values and Percent Daily Value as reference standards indicating how much a nutrient in one serving contributes to recommended daily intake.2 Using normalized measures such as sugar percent Daily Value and total fat percent Daily Value allows comparisons across recipes that may differ in serving size or formulation. Additionally, USDA FoodData Central provides authoritative nutrient composition data for foods such as honey and granulated sugar, supporting ingredient level comparisons of energy and sugar content.3

Despite this body of work, an important gap remains. Prior nutrition research examines sweeteners under controlled experimental conditions, and computational recipe research validates large scale datasets, but there has been limited empirical analysis comparing honey based and sugar based desserts under strict filtering conditions within real world recipe data. Furthermore, research on health perception demonstrates that natural labeling can bias consumer judgments, yet it does not directly test whether such perceptions align with actual nutritional differences in complete recipes. Our project addresses this gap by applying explicit inclusion criteria, restricting to dessert tagged recipes containing a single primary sweetener, and conducting statistical comparisons of caloric content, sugar percent Daily Value, and total fat percent Daily Value between honey and sugar based desserts.

We also acknowledge that honey based desserts may systematically differ in other ingredients such as inclusion of nuts, oils, or butter, meaning any observed differences may reflect broader recipe style patterns rather than the sweetener alone. Accordingly, we will clearly document our filtering decisions and interpret results with this potential confounding in mind.

  1. ^ Majumder, B. P., Li, S., Ni, J., & McAuley, J. (2019). Generating Personalized Recipes from Historical User Preferences. University of California, San Diego.
  2. ^ U.S. Food and Drug Administration (FDA). Daily Value and nutrition labeling reference. https://www.fda.gov
  3. ^ USDA FoodData Central. Nutrition entries for honey and granulated sugar. https://fdc.nal.usda.gov

Hypothesis¶

We hypothesize that desserts using honey as the primary sweetener will have higher overall caloric content and slightly higher total fat compared to sugar-sweetened desserts when normalized to FDA daily values, while sugar (% daily value) may be comparable. Although honey is often perceived as a healthier alternative, it is more calorically dense than sugar (64 cal/g for 1 tablespoon of honey, as compared to 48 cal/g for 1 tablespoon of sugar), and recipes claiming to be 'healthy' may not reduce quantities enough to offset this difference. Additionally, honey-based desserts often include complementary fat sources (e.g., butter, oils, nuts), which may contribute to comparable or higher fat content.

Calories

  • Null Hypothesis (H₀): μhoney - μsugar ≤ 0

  • Alternate Hypothesis (H₁): μhoney - μsugar > 0

Sugar (% DV)

  • Null Hypothesis (H₀): μhoney - μsugar = 0

  • Alternate Hypothesis (H₁): μhoney - μsugar ≠ 0

Total Fat (% DV)

  • Null Hypothesis (H₀): μhoney - μsugar ≤ 0

  • Alternate Hypothesis (H₁): μhoney - μsugar > 0


Data¶

Data overview¶

Dataset Name: Food.com Recipes

Link to the dataset: https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions/data?select=RAW_recipes.csv

Number of observations: 231,637 recipes

Number of variables: 12

Relevant variables:

tags: Tags are the user designated, short descriptions which denote key qualities, such as cook time, ingredients used, health-related descriptions, place of origin, and more. To note, these descriptions are self proclaimed and fully up to the discretion of the inputter, and are not fact checked. In our project, we aim to use tags to filter for specifically items labeled to be "desserts", helping sort out relevant rows when discerning between sugar and honey.

nutrition: Nutrition is the health related metrics which we are analyzing. Specifically, each list gives the following information, in its respective order: Calories (#), total fat (Percent Daily Value), sugar (%VD), sodium (%DV), protein (%DV), saturated fat (%DV), carbohydrates (%DV). In our project, we ain to use nutrition as our basis for determinig the "healthiness" of a specific food, specifically analyzing the calories, total fat, and sugar content.

ingredients: Ingredients is a list, with all ingredients used in the recipe.

Shortcomings:

One shortcoming that we found with this dataset is the lack of serving size descriptions. This is relevant as we are comparing calories, sugar, and fat content, where larger portion sizes would skew these proportions. Additionally, as mentioned above, the tags used for classification are entirely user generated, therefore, we may not know exactly what one person considers to be a dessert or not.


Food.com Recipes Dataset¶

Description¶

The dataset we are using contains information on recipes from Food.com. There are many important metrics contained in this dataset, but for our specific research question, the nutrition column is especially relevant. This column is formatted as a list of floats as (calories (#), total fat (PDV), sugar (PDV), sodium (PDV), protein (PDV), saturated fat (PDV), and carbohydrates (PDV)).

Here are some of the most important metrics from this nutrition column: calories which is measured in kilocalories per serving and represents total energy content. Very high values (>800 kcal) indicate high energy density desserts. Total fat which is measured in percent daily value. 100% DV means one serving meets the entire recommended daily intake for fat. Sugar, which is measured in percent daily value, indicates how much of the recommended daily sugar intake one serving provides. Values near or above 100% suggest extremely high sugar content.

Some additional columns are also necessary like the Tags column which has a list of strings which help us filter out desserts. Another column that is important is the Ingredients column which is a list of strings which help us determine the recipes that have the primary sweetener as either honey or sugar.

Concerns¶

One concern is that determining the primary sweetener requires inferring from the ingredients list. The dataset does not explicitly label honey or sugar as primary ingredients. Some recipes may include both honey and sugar, or use alternative sweeteners alongside them. Misclassification could blur differences between groups.

Another concern is serving size variability. Nutritional values are reported per serving, but serving sizes are defined by the individual authors and are not standardized across all recipes. As a result, differences in calories or % Daily Value may partly reflect differences in portion definitions rather than true differences driven by the type of sweetener.

Similarly, the nutrition values may be estimated by the authors instead of being strictly tested and measured, which could also introduce potential measurement errors/variability.

Imports¶

In [ ]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import plotly.express as px
from scipy import stats
warnings.filterwarnings('ignore')

Data Loading¶

First, we load the raw data (uploaded as a CSV via Git LFS) using pandas:

In [ ]:
raw_df = pd.read_csv("data/00-raw/RAW_recipes.csv")
raw_df.head()
Out[ ]:
name id minutes contributor_id submitted tags nutrition n_steps steps description ingredients n_ingredients
0 arriba baked winter squash mexican style 137739 55 47892 2005-09-16 ['60-minutes-or-less', 'time-to-make', 'course... [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] 11 ['make a choice and proceed with recipe', 'dep... autumn is my favorite time of year to cook! th... ['winter squash', 'mexican seasoning', 'mixed ... 7
1 a bit different breakfast pizza 31490 30 26278 2002-06-17 ['30-minutes-or-less', 'time-to-make', 'course... [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] 9 ['preheat oven to 425 degrees f', 'press dough... this recipe calls for the crust to be prebaked... ['prepared pizza crust', 'sausage patty', 'egg... 6
2 all in the kitchen chili 112140 130 196586 2005-02-25 ['time-to-make', 'course', 'preparation', 'mai... [269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0] 6 ['brown ground beef in large pot', 'add choppe... this modified version of 'mom's' chili was a h... ['ground beef', 'yellow onions', 'diced tomato... 13
3 alouette potatoes 59389 45 68585 2003-04-14 ['60-minutes-or-less', 'time-to-make', 'course... [368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0] 11 ['place potatoes in a large pot of lightly sal... this is a super easy, great tasting, make ahea... ['spreadable cheese with garlic and herbs', 'n... 11
4 amish tomato ketchup for canning 44061 190 41706 2002-10-25 ['weeknight', 'time-to-make', 'course', 'main-... [352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0] 5 ['mix all ingredients& boil for 2 1 / 2 hours ... my dh's amish mother raised him on this recipe... ['tomato juice', 'apple cider vinegar', 'sugar... 8

Then, we understand the number of observations and variables within this raw data:

In [ ]:
raw_df.shape
Out[ ]:
(231637, 12)

We list the variables within this raw data, understanding their data types:

In [ ]:
raw_df.dtypes
Out[ ]:
name              object
id                 int64
minutes            int64
contributor_id     int64
submitted         object
tags              object
nutrition         object
n_steps            int64
steps             object
description       object
ingredients       object
n_ingredients      int64
dtype: object

Data Preprocessing¶

First, we filter the raw data by our variables of interest:

In [ ]:
processed_df = raw_df[['tags', 'nutrition', 'ingredients']]
processed_df.head()
Out[ ]:
tags nutrition ingredients
0 ['60-minutes-or-less', 'time-to-make', 'course... [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] ['winter squash', 'mexican seasoning', 'mixed ...
1 ['30-minutes-or-less', 'time-to-make', 'course... [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] ['prepared pizza crust', 'sausage patty', 'egg...
2 ['time-to-make', 'course', 'preparation', 'mai... [269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0] ['ground beef', 'yellow onions', 'diced tomato...
3 ['60-minutes-or-less', 'time-to-make', 'course... [368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0] ['spreadable cheese with garlic and herbs', 'n...
4 ['weeknight', 'time-to-make', 'course', 'main-... [352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0] ['tomato juice', 'apple cider vinegar', 'sugar...

Although our dataset looks consistent and tidy, we run a quick check for null values to flag any missing entries:

In [ ]:
processed_df.isnull().sum()
Out[ ]:
tags           0
nutrition      0
ingredients    0
dtype: int64

Next, given that all data entries in the dataset are stored as strings, we convert them into lists:

In [ ]:
def to_list(str):
    as_list = str.strip("[]").replace("'", "").replace('"', "").strip().split(",")
    cleaned = []
    for elem in as_list:
        cleaned.append(elem.strip())
    return cleaned

processed_df["tags"] = processed_df["tags"].apply(to_list)
processed_df["nutrition"] = processed_df["nutrition"].apply(to_list)
processed_df
Out[ ]:
tags nutrition ingredients
0 [60-minutes-or-less, time-to-make, course, mai... [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] ['winter squash', 'mexican seasoning', 'mixed ...
1 [30-minutes-or-less, time-to-make, course, mai... [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] ['prepared pizza crust', 'sausage patty', 'egg...
2 [time-to-make, course, preparation, main-dish,... [269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0] ['ground beef', 'yellow onions', 'diced tomato...
3 [60-minutes-or-less, time-to-make, course, mai... [368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0] ['spreadable cheese with garlic and herbs', 'n...
4 [weeknight, time-to-make, course, main-ingredi... [352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0] ['tomato juice', 'apple cider vinegar', 'sugar...
... ... ... ...
231632 [ham, 60-minutes-or-less, time-to-make, course... [415.2, 26.0, 34.0, 26.0, 44.0, 21.0, 15.0] ['celery', 'onion', 'green sweet pepper', 'gar...
231633 [15-minutes-or-less, time-to-make, course, pre... [14.8, 0.0, 2.0, 58.0, 1.0, 0.0, 1.0] ['paprika', 'salt', 'garlic powder', 'onion po...
231634 [60-minutes-or-less, time-to-make, course, mai... [59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0] ['hard-cooked eggs', 'mayonnaise', 'dijon must...
231635 [30-minutes-or-less, time-to-make, course, pre... [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] ['butter', 'eagle brand condensed milk', 'ligh...
231636 [30-minutes-or-less, time-to-make, course, pre... [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] ['granulated sugar', 'shortening', 'eggs', 'fl...

231637 rows × 3 columns

Now, we identify recipes in the data as 'desserts' based on whether pr not they contain a 'desserts' tag:

In [ ]:
is_dessert_list = []

for tag_sublist in processed_df['tags']:
    if 'desserts' in tag_sublist:
        is_dessert_list.append(True)
    else:
        is_dessert_list.append(False)

processed_df['is_dessert'] = is_dessert_list
processed_df
Out[ ]:
tags nutrition ingredients is_dessert
0 [60-minutes-or-less, time-to-make, course, mai... [51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0] ['winter squash', 'mexican seasoning', 'mixed ... False
1 [30-minutes-or-less, time-to-make, course, mai... [173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0] ['prepared pizza crust', 'sausage patty', 'egg... False
2 [time-to-make, course, preparation, main-dish,... [269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0] ['ground beef', 'yellow onions', 'diced tomato... False
3 [60-minutes-or-less, time-to-make, course, mai... [368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0] ['spreadable cheese with garlic and herbs', 'n... False
4 [weeknight, time-to-make, course, main-ingredi... [352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0] ['tomato juice', 'apple cider vinegar', 'sugar... False
... ... ... ... ...
231632 [ham, 60-minutes-or-less, time-to-make, course... [415.2, 26.0, 34.0, 26.0, 44.0, 21.0, 15.0] ['celery', 'onion', 'green sweet pepper', 'gar... False
231633 [15-minutes-or-less, time-to-make, course, pre... [14.8, 0.0, 2.0, 58.0, 1.0, 0.0, 1.0] ['paprika', 'salt', 'garlic powder', 'onion po... False
231634 [60-minutes-or-less, time-to-make, course, mai... [59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0] ['hard-cooked eggs', 'mayonnaise', 'dijon must... False
231635 [30-minutes-or-less, time-to-make, course, pre... [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] ['butter', 'eagle brand condensed milk', 'ligh... True
231636 [30-minutes-or-less, time-to-make, course, pre... [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] ['granulated sugar', 'shortening', 'eggs', 'fl... True

231637 rows × 4 columns

We then filter the data to only include recipes for desserts, dropping the other tags:

In [ ]:
processed_df = processed_df[processed_df['is_dessert']].drop(columns=['tags', 'is_dessert']).reset_index(drop=True)
processed_df
Out[ ]:
nutrition ingredients
0 [4270.8, 254.0, 1306.0, 111.0, 127.0, 431.0, 2... ['chocolate sandwich style cookies', 'chocolat...
1 [734.1, 66.0, 199.0, 10.0, 10.0, 117.0, 28.0] ['vanilla wafers', 'butter', 'powdered sugar',...
2 [232.7, 21.0, 77.0, 4.0, 6.0, 38.0, 8.0] ['butterscotch chips', 'chinese noodles', 'sal...
3 [1663.3, 221.0, 168.0, 66.0, 19.0, 158.0, 29.0] ['all-purpose flour', 'granulated sugar', 'bak...
4 [174.4, 13.0, 67.0, 5.0, 4.0, 26.0, 7.0] ['butter', 'sugar', 'vanilla', 'eggs', 'all-pu...
... ... ...
43198 [561.3, 38.0, 122.0, 2.0, 16.0, 76.0, 25.0] ['all-purpose flour', 'unsalted butter', 'egg'...
43199 [535.0, 29.0, 194.0, 18.0, 15.0, 15.0, 28.0] ['margarine', 'all-purpose flour', 'sugar', 'b...
43200 [56.2, 2.0, 4.0, 1.0, 2.0, 3.0, 3.0] ['sugar', 'active dry yeast', 'milk', 'butter'...
43201 [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] ['butter', 'eagle brand condensed milk', 'ligh...
43202 [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] ['granulated sugar', 'shortening', 'eggs', 'fl...

43203 rows × 2 columns

Based on ingredients, we identify which recipes contain sugar and honey:

In [ ]:
processed_df["contains_honey"] = processed_df["ingredients"].str.contains("honey", case=False)
processed_df["contains_sugar"] = processed_df["ingredients"].str.contains("sugar", case=False)
processed_df
Out[ ]:
nutrition ingredients contains_honey contains_sugar
0 [4270.8, 254.0, 1306.0, 111.0, 127.0, 431.0, 2... ['chocolate sandwich style cookies', 'chocolat... False False
1 [734.1, 66.0, 199.0, 10.0, 10.0, 117.0, 28.0] ['vanilla wafers', 'butter', 'powdered sugar',... False True
2 [232.7, 21.0, 77.0, 4.0, 6.0, 38.0, 8.0] ['butterscotch chips', 'chinese noodles', 'sal... False False
3 [1663.3, 221.0, 168.0, 66.0, 19.0, 158.0, 29.0] ['all-purpose flour', 'granulated sugar', 'bak... False True
4 [174.4, 13.0, 67.0, 5.0, 4.0, 26.0, 7.0] ['butter', 'sugar', 'vanilla', 'eggs', 'all-pu... False True
... ... ... ... ...
43198 [561.3, 38.0, 122.0, 2.0, 16.0, 76.0, 25.0] ['all-purpose flour', 'unsalted butter', 'egg'... False True
43199 [535.0, 29.0, 194.0, 18.0, 15.0, 15.0, 28.0] ['margarine', 'all-purpose flour', 'sugar', 'b... False True
43200 [56.2, 2.0, 4.0, 1.0, 2.0, 3.0, 3.0] ['sugar', 'active dry yeast', 'milk', 'butter'... False True
43201 [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] ['butter', 'eagle brand condensed milk', 'ligh... False True
43202 [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] ['granulated sugar', 'shortening', 'eggs', 'fl... False True

43203 rows × 4 columns

Based on this information, we filter for recipes containing only one primary sweetener (i.e. only honey or only sugar):

In [ ]:
processed_df["only_honey"] = processed_df[(processed_df["contains_honey"] == True) & (processed_df["contains_sugar"] == False)]["contains_honey"]
processed_df["only_sugar"] = processed_df[(processed_df["contains_honey"] == False) & (processed_df["contains_sugar"] == True)]["contains_sugar"]
processed_df["only_honey"] = processed_df["only_honey"].fillna(False)
processed_df["only_sugar"] = processed_df["only_sugar"].fillna(False)
processed_df = processed_df.drop(columns=["contains_honey", "contains_sugar"])
processed_df
Out[ ]:
nutrition ingredients only_honey only_sugar
0 [4270.8, 254.0, 1306.0, 111.0, 127.0, 431.0, 2... ['chocolate sandwich style cookies', 'chocolat... False False
1 [734.1, 66.0, 199.0, 10.0, 10.0, 117.0, 28.0] ['vanilla wafers', 'butter', 'powdered sugar',... False True
2 [232.7, 21.0, 77.0, 4.0, 6.0, 38.0, 8.0] ['butterscotch chips', 'chinese noodles', 'sal... False False
3 [1663.3, 221.0, 168.0, 66.0, 19.0, 158.0, 29.0] ['all-purpose flour', 'granulated sugar', 'bak... False True
4 [174.4, 13.0, 67.0, 5.0, 4.0, 26.0, 7.0] ['butter', 'sugar', 'vanilla', 'eggs', 'all-pu... False True
... ... ... ... ...
43198 [561.3, 38.0, 122.0, 2.0, 16.0, 76.0, 25.0] ['all-purpose flour', 'unsalted butter', 'egg'... False True
43199 [535.0, 29.0, 194.0, 18.0, 15.0, 15.0, 28.0] ['margarine', 'all-purpose flour', 'sugar', 'b... False True
43200 [56.2, 2.0, 4.0, 1.0, 2.0, 3.0, 3.0] ['sugar', 'active dry yeast', 'milk', 'butter'... False True
43201 [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] ['butter', 'eagle brand condensed milk', 'ligh... False True
43202 [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] ['granulated sugar', 'shortening', 'eggs', 'fl... False True

43203 rows × 4 columns

Now, we can exclude those desserts which have no primary sweetener (i.e. no honey or sugar):

In [ ]:
processed_df = processed_df[((processed_df["only_honey"] == True) & (processed_df["only_sugar"] == False)) | ((processed_df["only_honey"] == False) & (processed_df["only_sugar"] == True))]
processed_df = processed_df.reset_index(drop=True)
processed_df
Out[ ]:
nutrition ingredients only_honey only_sugar
0 [734.1, 66.0, 199.0, 10.0, 10.0, 117.0, 28.0] ['vanilla wafers', 'butter', 'powdered sugar',... False True
1 [1663.3, 221.0, 168.0, 66.0, 19.0, 158.0, 29.0] ['all-purpose flour', 'granulated sugar', 'bak... False True
2 [174.4, 13.0, 67.0, 5.0, 4.0, 26.0, 7.0] ['butter', 'sugar', 'vanilla', 'eggs', 'all-pu... False True
3 [5467.4, 516.0, 1196.0, 135.0, 110.0, 615.0, 1... ['shortening', 'icing sugar', 'vanilla', 'all-... False True
4 [175.2, 11.0, 15.0, 8.0, 7.0, 21.0, 7.0] ['flour', 'salt', 'baking powder', 'sugar', 'b... False True
... ... ... ... ...
34752 [561.3, 38.0, 122.0, 2.0, 16.0, 76.0, 25.0] ['all-purpose flour', 'unsalted butter', 'egg'... False True
34753 [535.0, 29.0, 194.0, 18.0, 15.0, 15.0, 28.0] ['margarine', 'all-purpose flour', 'sugar', 'b... False True
34754 [56.2, 2.0, 4.0, 1.0, 2.0, 3.0, 3.0] ['sugar', 'active dry yeast', 'milk', 'butter'... False True
34755 [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] ['butter', 'eagle brand condensed milk', 'ligh... False True
34756 [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] ['granulated sugar', 'shortening', 'eggs', 'fl... False True

34757 rows × 4 columns

We can now successfully drop the ingredients column:

In [ ]:
processed_df = processed_df.drop(columns=['ingredients'])
processed_df
Out[ ]:
nutrition only_honey only_sugar
0 [734.1, 66.0, 199.0, 10.0, 10.0, 117.0, 28.0] False True
1 [1663.3, 221.0, 168.0, 66.0, 19.0, 158.0, 29.0] False True
2 [174.4, 13.0, 67.0, 5.0, 4.0, 26.0, 7.0] False True
3 [5467.4, 516.0, 1196.0, 135.0, 110.0, 615.0, 1... False True
4 [175.2, 11.0, 15.0, 8.0, 7.0, 21.0, 7.0] False True
... ... ... ...
34752 [561.3, 38.0, 122.0, 2.0, 16.0, 76.0, 25.0] False True
34753 [535.0, 29.0, 194.0, 18.0, 15.0, 15.0, 28.0] False True
34754 [56.2, 2.0, 4.0, 1.0, 2.0, 3.0, 3.0] False True
34755 [188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0] False True
34756 [174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0] False True

34757 rows × 3 columns

Now, we bifurcate the nutrition values into a separate table based on the metadata provided:

In [ ]:
col_names = ["calories", "total_fat (%DV)", "sugar (%DV)", "sodium (%DV)", "protein (%DV)", "sat_fat (%DV)", "carbs (%DV)"]
nutrition_df = pd.DataFrame(processed_df["nutrition"].tolist())
nutrition_df.columns = col_names
nutrition_df = nutrition_df.apply(pd.to_numeric).astype(float)
nutrition_df
Out[ ]:
calories total_fat (%DV) sugar (%DV) sodium (%DV) protein (%DV) sat_fat (%DV) carbs (%DV)
0 734.1 66.0 199.0 10.0 10.0 117.0 28.0
1 1663.3 221.0 168.0 66.0 19.0 158.0 29.0
2 174.4 13.0 67.0 5.0 4.0 26.0 7.0
3 5467.4 516.0 1196.0 135.0 110.0 615.0 188.0
4 175.2 11.0 15.0 8.0 7.0 21.0 7.0
... ... ... ... ... ... ... ...
34752 561.3 38.0 122.0 2.0 16.0 76.0 25.0
34753 535.0 29.0 194.0 18.0 15.0 15.0 28.0
34754 56.2 2.0 4.0 1.0 2.0 3.0 3.0
34755 188.0 11.0 57.0 11.0 7.0 21.0 9.0
34756 174.9 14.0 33.0 4.0 4.0 11.0 6.0

34757 rows × 7 columns

We then merge these nutritional values into the filtered desserts data and assign the index an identifier:

In [ ]:
processed_df = processed_df.merge(nutrition_df, left_on=processed_df.index, right_on=nutrition_df.index, how='outer').drop(columns=["key_0", "nutrition"])
processed_df.index.name = 'dessert_id'
processed_df
Out[ ]:
only_honey only_sugar calories total_fat (%DV) sugar (%DV) sodium (%DV) protein (%DV) sat_fat (%DV) carbs (%DV)
dessert_id
0 False True 734.1 66.0 199.0 10.0 10.0 117.0 28.0
1 False True 1663.3 221.0 168.0 66.0 19.0 158.0 29.0
2 False True 174.4 13.0 67.0 5.0 4.0 26.0 7.0
3 False True 5467.4 516.0 1196.0 135.0 110.0 615.0 188.0
4 False True 175.2 11.0 15.0 8.0 7.0 21.0 7.0
... ... ... ... ... ... ... ... ... ...
34752 False True 561.3 38.0 122.0 2.0 16.0 76.0 25.0
34753 False True 535.0 29.0 194.0 18.0 15.0 15.0 28.0
34754 False True 56.2 2.0 4.0 1.0 2.0 3.0 3.0
34755 False True 188.0 11.0 57.0 11.0 7.0 21.0 9.0
34756 False True 174.9 14.0 33.0 4.0 4.0 11.0 6.0

34757 rows × 9 columns

Summary Statistics¶

Computing rudimentary summary statistics based on primary sweetener:

In [ ]:
var_cols = ['calories', 'total_fat (%DV)', 'sugar (%DV)', 'sodium (%DV)', 'protein (%DV)', 'sat_fat (%DV)', 'carbs (%DV)']
processed_df.groupby(['only_honey','only_sugar'])[var_cols].agg(['mean','median','std'])
Out[ ]:
calories total_fat (%DV) sugar (%DV) sodium (%DV) protein (%DV) sat_fat (%DV) carbs (%DV)
mean median std mean median std mean median std mean ... std mean median std mean median std mean median std
only_honey only_sugar
False True 652.526698 314.9 1250.196051 48.962675 21.0 103.249706 229.333745 104.0 472.586461 15.973831 ... 41.856871 16.724302 8.0 34.572365 74.437713 30.0 163.696113 28.259125 13.0 56.071097
True False 340.492357 196.0 540.636840 22.686624 9.0 44.201373 132.029299 80.0 218.588061 7.439490 ... 26.092051 12.364331 7.0 21.829751 28.863694 10.0 57.253023 15.704459 10.0 25.626921

2 rows × 21 columns

Understanding outliers using a logarithmic scale to gauge how to filter:

In [ ]:
processed_df['primary_sweetener'] = processed_df.apply(lambda x: 'Honey' if x['only_honey'] else 'Sugar', axis=1)
sns.boxplot(x='primary_sweetener', y='calories', data=processed_df)
plt.yscale('log')
plt.title('Calories Distribution (log scale): Honey vs Sugar Desserts')
plt.xlabel('Primary Sweetener')
plt.ylabel('Calories')
plt.figure(figsize=(6,4))
plt.show()
No description has been provided for this image
<Figure size 600x400 with 0 Axes>

Filtering outliers based on the interquartile range (IQR):

In [ ]:
Q1 = processed_df['calories'].quantile(0.25)
Q3 = processed_df['calories'].quantile(0.75)
IQR = Q3 - Q1

lower_quartile = Q1 - 1.5 * IQR
upper_quartile = Q3 + 1.5 * IQR

processed_df = processed_df[(processed_df['calories'] >= lower_quartile) & (processed_df['calories'] <= upper_quartile)]
processed_df
Out[ ]:
only_honey only_sugar calories total_fat (%DV) sugar (%DV) sodium (%DV) protein (%DV) sat_fat (%DV) carbs (%DV) primary_sweetener
dessert_id
0 False True 734.1 66.0 199.0 10.0 10.0 117.0 28.0 Sugar
2 False True 174.4 13.0 67.0 5.0 4.0 26.0 7.0 Sugar
4 False True 175.2 11.0 15.0 8.0 7.0 21.0 7.0 Sugar
5 False True 387.6 39.0 98.0 12.0 12.0 67.0 11.0 Sugar
6 False True 456.6 32.0 164.0 15.0 11.0 63.0 20.0 Sugar
... ... ... ... ... ... ... ... ... ... ...
34752 False True 561.3 38.0 122.0 2.0 16.0 76.0 25.0 Sugar
34753 False True 535.0 29.0 194.0 18.0 15.0 15.0 28.0 Sugar
34754 False True 56.2 2.0 4.0 1.0 2.0 3.0 3.0 Sugar
34755 False True 188.0 11.0 57.0 11.0 7.0 21.0 9.0 Sugar
34756 False True 174.9 14.0 33.0 4.0 4.0 11.0 6.0 Sugar

30986 rows × 10 columns

In [ ]:
sns.boxplot(x='primary_sweetener', y='calories', data=processed_df)
plt.title('Calories: Honey vs Sugar Desserts (Outliers Removed)')
plt.xlabel('Primary Sweetener')
plt.ylabel('Calories')
plt.figure(figsize=(6,4))
plt.show()
No description has been provided for this image
<Figure size 600x400 with 0 Axes>

Understanding the types of variables in our resulting data:

In [ ]:
processed_df.dtypes
Out[ ]:
only_honey              bool
only_sugar              bool
calories             float64
total_fat (%DV)      float64
sugar (%DV)          float64
sodium (%DV)         float64
protein (%DV)        float64
sat_fat (%DV)        float64
carbs (%DV)          float64
primary_sweetener     object
dtype: object

Counting the number of variables in our resulting data, aggregated by their data type:

In [ ]:
processed_df.dtypes.value_counts()
Out[ ]:
float64    7
bool       2
object     1
Name: count, dtype: int64

Finally, compiling the number of desserts based on their primary sweetener:

In [ ]:
processed_df['primary_sweetener'].value_counts()
Out[ ]:
primary_sweetener
Sugar    30236
Honey      750
Name: count, dtype: int64

Acknowledging this imbalance, this is the final processed data which we will be working with, which we can write to a CSV file:

In [ ]:
processed_df.to_csv("data/02-processed/desserts.csv")

Results¶

Exploratory Data Analysis¶

Section 1: Comparing Honey vs Sugar Desserts on Key Nutrition Metrics¶

To address our research question, we begin by exploring how dessert recipes differ by primary sweetener (Honey vs Sugar) in three core variables: calories, sugar (%DV), and total_fat (%DV).

We explicitly load the fully wrangled dataset from data/02-processed/desserts.csv so this section can run independently without re-running data wrangling. We then:

  • Check sample sizes for each sweetener group to understand balance and representation
  • Compute descriptive statistics to compare center and spread
  • Visualize distributions with boxplots and overlaid histograms to assess skew and outliers

These exploratory results help us understand whether group-level differences are large enough and consistent enough to justify formal hypothesis testing in later sections.

In [ ]:
desserts = pd.read_csv('data/02-processed/desserts.csv')

required_cols = ['primary_sweetener', 'calories', 'sugar (%DV)', 'total_fat (%DV)']
missing = [col for col in required_cols if col not in desserts.columns]
if missing:
    raise ValueError(f'Missing columns in cleaned data: {missing}')

eda_df = desserts[desserts['primary_sweetener'].isin(['Honey', 'Sugar'])].copy()
metrics = ['calories', 'sugar (%DV)', 'total_fat (%DV)']
eda_df = eda_df.dropna(subset=metrics + ['primary_sweetener'])

print(f'Rows in EDA subset: {len(eda_df):,}')
display(eda_df.head())
Rows in EDA subset: 30,986
dessert_id only_honey only_sugar calories total_fat (%DV) sugar (%DV) sodium (%DV) protein (%DV) sat_fat (%DV) carbs (%DV) primary_sweetener
0 0 False True 734.1 66.0 199.0 10.0 10.0 117.0 28.0 Sugar
1 2 False True 174.4 13.0 67.0 5.0 4.0 26.0 7.0 Sugar
2 4 False True 175.2 11.0 15.0 8.0 7.0 21.0 7.0 Sugar
3 5 False True 387.6 39.0 98.0 12.0 12.0 67.0 11.0 Sugar
4 6 False True 456.6 32.0 164.0 15.0 11.0 63.0 20.0 Sugar

Grouping our data by primary sweetener, and understanding it through summary statistics comparisons:

In [ ]:
print('Group counts:')
print(eda_df['primary_sweetener'].value_counts())

print('\nDescriptive statistics by sweetener group:')
display(eda_df.groupby('primary_sweetener')[metrics].describe().round(2))

median_compare = (
    eda_df.groupby('primary_sweetener')[metrics]
    .median()
    .rename_axis('primary_sweetener')
    .round(2))

print('\nMedian comparison (Honey vs Sugar):')
display(median_compare)
Group counts:
primary_sweetener
Sugar    30236
Honey      750
Name: count, dtype: int64

Descriptive statistics by sweetener group:
calories sugar (%DV) total_fat (%DV)
count mean std min 25% 50% 75% max count mean ... 75% max count mean std min 25% 50% 75% max
primary_sweetener
Honey 750.0 241.12 193.68 0.4 109.25 186.0 307.75 1105.1 750.0 94.66 ... 130.0 695.0 750.0 15.19 18.77 0.0 3.0 8.0 21.0 137.0
Sugar 30236.0 328.97 224.31 0.3 152.30 278.9 450.32 1122.8 30236.0 114.96 ... 155.0 1087.0 30236.0 23.95 21.39 0.0 8.0 18.0 34.0 168.0

2 rows × 24 columns

Median comparison (Honey vs Sugar):
calories sugar (%DV) total_fat (%DV)
primary_sweetener
Honey 186.0 74.0 8.0
Sugar 278.9 93.0 18.0

Visually understanding our resulting summary statistics before proceeding with formal statistical tests:

In [ ]:
fig, axes = plt.subplots(1, 3, figsize=(16, 4.8))

for i, metric in enumerate(metrics):
    sns.boxplot(
        data=eda_df, x='primary_sweetener', y=metric,
        order=['Honey', 'Sugar'], ax=axes[i], palette='Set2'
    )
    axes[i].set_title(f'{metric} by Primary Sweetener')
    axes[i].set_xlabel('Primary Sweetener')
    axes[i].set_ylabel(metric)

plt.tight_layout()
plt.show()
No description has been provided for this image

Understanding the underlying distributions of calories, sugar (%DV) and total fat (%DV) using histograms:

In [ ]:
fig, axes = plt.subplots(1, 3, figsize=(16, 4.8))

for i, metric in enumerate(metrics):
    sns.histplot(
        data=eda_df, x=metric, hue='primary_sweetener',
        hue_order=['Honey', 'Sugar'], bins=30, stat='density',
        common_norm=False, kde=True, alpha=0.35, ax=axes[i]
    )
    axes[i].set_title(f'Distribution of {metric}')
    axes[i].set_xlabel(metric)
    axes[i].set_ylabel('Density')

plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation: Across all 3 metrics, honey-only desserts tend to have lower median values compared to sugar-only desserts. More specifically, the median calorie count for honey is 186 kcal compared to 278.9 kcal for sugar, and similar patterns hold for sugar (%DV) (74 vs 93) and total fat (%DV) (8 vs 18). Notably, both groups show strong right skew with long upper tails, meaning means are pulled above medians for both groups. These distributional properties will be examined further before proceeding with formal hypothesis testing.


Section 2: Group Representation and Class Imbalance¶

Subsequently, we examine the balance between honey-only and sugar-only dessert recipes in our filtered dataset. From our data exploration, we acknowledged a significant imbalance between our groups, which can affect the reliability of our statistical comparisons, so it is important to understand the representation of each group before we begin any formal statistical testing.

In [ ]:
fig, ax = plt.subplots(figsize=(6, 4))
counts = eda_df['primary_sweetener'].value_counts()
sns.barplot(x=counts.index, y=counts.values, palette='Set2', ax=ax)
ax.set_title('Number of Recipes by Primary Sweetener')
ax.set_xlabel('Primary Sweetener')
ax.set_ylabel('Count')
for i, v in enumerate(counts.values):
    ax.text(i, v + 100, f'{v:,}', ha='center')
plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation: The dataset is heavily imbalanced, with 30,236 sugar-only desserts compared to just 750 honey-only desserts. This is roughly a 40:1 ratio. Our concern is not with instability in the sugar group, but rather that its large sample size inflates statistical power, meaning even trivially small differences could appear statistically significant.

However, simply taking a subset of 750 random samples from the sugar group would not enable us to understand its salient features. As such, we will proceed with the following imbalance, acknowledging that the Central Limit Theorem should hold and mitigate any concerns. Moreover, we will report Cohen's d alongside p-values to distinguish statistical significance from practical relevance.


Section 3: Outlier and Distribution Assumptions¶

Next, we address certain assumptions within our dataset:

  • Outliers: Rather than applying independent IQR filtering to each metric, we rely on the calorie-based outlier removal applied during preprocessing. Since calories are a direct function of fat, sugar, and carbohydrates, extreme values in sugar (%DV) and total fat (%DV) would likely have been already removed as calorie outliers. This is an intentional choice to retain the distribution of percentages based on raw caloric value.

  • Distributions: Based on the right skews seen in the histograms in Section 1, we use QQ-plots to check for normality. While we can still work with data that is skewed, it is important to gauge the shape and extent of skew as it directly affects our interpretation of results.

In [ ]:
fig, axes = plt.subplots(2, 3, figsize=(16, 8))

for j, group in enumerate(['Honey', 'Sugar']):
    group_data = eda_df[eda_df['primary_sweetener'] == group]
    for i, metric in enumerate(metrics):
        stats.probplot(group_data[metric], dist='norm', plot=axes[j, i])
        axes[j, i].set_title(f'{group}: {metric}')

plt.suptitle('QQ Plots by Primary Sweetener and Metric', y=1.02)
plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation: The QQ plots confirm significant right skew across all three metrics for both groups, i.e. the upper tails curve sharply away from the theoretical line in every plot, indicating that extreme high values are more common than a normal distribution would predict. This is consistent with the histograms observed in Section 1, where there was a visible right skew.

Observing the QQ-plots, the honey group shows more pronounced deviation at the upper tail compared to the sugar group, likely reflecting its smaller sample size making it more sensitive to individual extreme recipes.

Despite this, given that both groups have sufficient sample sizes (n=750 for honey, n=30,236 for sugar), the Central Limit Theorem ensures that the sampling distributions of the means will be approximately normal regardless of the underlying distribution. As such, we can proceed with Welch's independent samples t-tests, noting that results should be interpreted alongside Cohen's d effect sizes given the group imbalance discussed in Section 2.


Section 4: Correlation between Nutrition Metrics¶

Finally, we examine whether calories, sugar (%DV), and total fat (%DV) are correlated with one another. If the metrics are highly correlated, we would not want to treat them as independent of each other, as it would skew our inferential results.

In [ ]:
fig, ax = plt.subplots(figsize=(6, 4))
corr = eda_df[metrics].corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', 
            ax=ax, vmin=-1, vmax=1, square=True)
ax.set_title('Correlation Between Nutritional Metrics')
plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation: Based on the correlation heatmap, there seems to be a positive correlation between all three metrics. Calories and total fat (%DV) are most strongly correlated (r=0.86), which is expected given that fat is the most calorie-dense macronutrient at 9 kcal/g. Calories and sugar (%DV) are also strongly correlated (r=0.77), while total fat (%DV) and sugar (%DV) show a weaker but still positive correlation (r=0.43), suggesting that high-fat desserts are not necessarily high-sugar and vice versa.

As a result, we can conclude that these variables are not independent of one another. As such, our nutritional metrics of choice are related and should not be interpreted independently in downstream analysis. Since we are conducting three separate hypothesis tests on correlated outcomes, we apply a Bonferroni correction (α=0.05/3=0.017) to control the error rate, which we carry forward into further sections.


Hypothesis Testing¶

Having completed our exploratory data analysis, we now conduct formal hypothesis testing to understand whether the observed nutritional differences between honey and sugar desserts are statistically significant. To do this, we conduct a Welch's independent samples t-test as the primary hypothesis test, chosen due to unequal sample sizes and variances, and a permutation test as a non-parametric check that does not require us to make any assumptions of the underlying distribution. First, we begin by reiterating our hypotheses:

Calories

  • Null Hypothesis (H₀): μhoney - μsugar ≤ 0

  • Alternate Hypothesis (H₁): μhoney - μsugar > 0

Sugar (% DV)

  • Null Hypothesis (H₀): μhoney - μsugar = 0

  • Alternate Hypothesis (H₁): μhoney - μsugar ≠ 0

Total Fat (% DV)

  • Null Hypothesis (H₀): μhoney - μsugar ≤ 0

  • Alternate Hypothesis (H₁): μhoney - μsugar > 0


Section 1: Welch's t-test¶

To formally test our hypotheses about the nutritional differences between honey- and sugar-sweetened desserts, we first used a Welch's independent samples t-test, given that our groups are unequal in size and have differing variances. In doing so, we can assess whether the observed differences in calories, sugar (%DV), and total fat (%DV) are statistically significant.

In [ ]:
metrics = ['calories', 'sugar (%DV)', 'total_fat (%DV)']
results = []

for metric in metrics:
    honey = desserts[desserts['primary_sweetener'] == 'Honey'][metric].dropna()
    sugar = desserts[desserts['primary_sweetener'] == 'Sugar'][metric].dropna()
    
    t_stat, p_two_tailed = stats.ttest_ind(honey, sugar, equal_var=False)
    if metric in ['calories', 'total_fat (%DV)']:
        p_val = p_two_tailed / 2 if t_stat > 0 else 1 - (p_two_tailed / 2)
        test_type = 'one-tailed'
    else:
        p_val = p_two_tailed
        test_type = 'two-tailed'
    
    n1, n2 = len(honey), len(sugar)
    s1, s2 = honey.var(ddof=1), sugar.var(ddof=1)
    pooled_sd = np.sqrt(((n1 - 1) * s1 + (n2 - 1) * s2) / (n1 + n2 - 2))
    cohen_d = (honey.mean() - sugar.mean()) / pooled_sd
    
    results.append({
        'metric': metric,
        'test_type': test_type,
        'honey_mean': honey.mean(),
        'sugar_mean': sugar.mean(),
        't_stat': t_stat,
        'p_value': p_val,
        'cohen_d': cohen_d
    })


welch_results = pd.DataFrame(results).round(4)
welch_results
Out[ ]:
metric test_type honey_mean sugar_mean t_stat p_value cohen_d
0 calories one-tailed 241.1173 328.9731 -12.2212 1.0 -0.3929
1 sugar (%DV) two-tailed 94.6573 114.9578 -6.9343 0.0 -0.2106
2 total_fat (%DV) one-tailed 15.1907 23.9504 -12.5780 1.0 -0.4106

Interpretation: We use Welch's t-test because the honey-only and sugar-only dessert groups are highly imbalanced in size and we do not want to assume equal variances. To match our hypotheses, calories and total fat are tested one-tailed, while sugar %DV is tested two-tailed. Using the Bonferroni corrected significance threshold of alpha = 0.017, the one-tailed tests for calories and total fat are not significant in the hypothesized direction because the observed differences go the opposite way: honey-only desserts have lower mean calories (241.12 vs 328.97) and lower mean total fat %DV (15.19 vs 23.95) than sugar-only desserts. Specifically, the Welch test results are t = -12.22, p = 1.0000 for calories and t = -12.58, p = 1.0000 for total fat under the directional alternative that honey desserts are higher.

For sugar %DV, the two-tailed Welch test remains statistically significant, with t = -6.93 and p < 0.001, showing that the groups differ, and the observed direction is again lower for honey-only desserts (94.66 vs 114.96). The corresponding effect sizes are modest, with Cohen's d = -0.39 for calories, d = -0.21 for sugar %DV, and d = -0.41 for total fat %DV.

Overall, we fail to reject the directional null hypotheses for calories and total fat, and we reject the null hypothesis for sugar %DV. In this filtered dataset, sugar-only desserts tend to be higher on all three nutritional measures than honey-only desserts, contrary to the original directional prediction for calories and total fat.


Section 2: Permutation Test on the Difference of Means¶

We also want to consider the possibility that the difference between calories, sugar (%DV), and fat (%DV) arise from random variation under the null hypothesis. To address this, we run a permutation test by repeatedly shuffling the dessert labels and computing the resulting differences in means. As such, we can compare the observed difference in means to what would be expected if there were no true difference between the groups, i.e. comparing the observed difference of means with the permuted distribution.

First, we will compute the observed differences in means for each nutritional variable:

In [ ]:
sugar_cal_mean = desserts[desserts["only_sugar"] == True]["calories"].mean()
honey_cal_mean = desserts[desserts["only_honey"] == True]["calories"].mean()
obs_diff_means_cal = honey_cal_mean - sugar_cal_mean
print(f"Observed Difference In Mean Calfories Between Honey and Sugar: {obs_diff_means_cal}")

sugar_sugar_mean = desserts[desserts["only_sugar"] == True]["sugar (%DV)"].mean()
honey_sugar_mean = desserts[desserts["only_honey"] == True]["sugar (%DV)"].mean()
obs_diff_means_sugar = honey_sugar_mean - sugar_sugar_mean
print(f"Observed Difference In Mean Sugar Between Honey and Sugar: {obs_diff_means_sugar}")

sugar_fat_mean = desserts[desserts["only_sugar"] == True]["total_fat (%DV)"].mean()
honey_fat_mean = desserts[desserts["only_honey"] == True]["total_fat (%DV)"].mean()
obs_diff_means_fat = honey_fat_mean - sugar_fat_mean
print(f"Observed Difference In Mean Fat Between Honey and Sugar: {obs_diff_means_fat}")
Observed Difference In Mean Calfories Between Honey and Sugar: -87.85571865767074
Observed Difference In Mean Sugar Between Honey and Sugar: -20.300498390439657
Observed Difference In Mean Fat Between Honey and Sugar: -8.759690523437845

Next, we perform 10,000 permutations of the group labels, computing the difference in means for each repetition.

In [ ]:
def diff_in_means(y, group):
    y = np.array(y)
    group = np.array(group)
    return y[group].mean() - y[~group].mean()

def permutation_test_diff_means(y, group, repetitions=10000):
    y = np.array(y)
    group = np.array(group)
    results = []
    for i in range(repetitions):
        shuffled = np.random.permutation(group)
        results.append(diff_in_means(y, shuffled))
    return results

perm_diff_calories = np.array(permutation_test_diff_means(desserts["calories"].values, desserts["only_honey"].values.astype(bool)))
perm_diff_sugar = np.array(permutation_test_diff_means(desserts["sugar (%DV)"].values, desserts["only_honey"].values.astype(bool)))
perm_diff_fat = np.array(permutation_test_diff_means(desserts["total_fat (%DV)"].values, desserts["only_honey"].values.astype(bool)))

We then calculate the p-value for calories, sugar (%DV), and fat (%DV) based on the permuted distributions.

In [ ]:
p_val_calories = np.mean(perm_diff_calories >= obs_diff_means_cal)
p_val_sugar = np.mean(np.abs(perm_diff_sugar) >= np.abs(obs_diff_means_sugar))
p_val_fat = np.mean(perm_diff_fat >= obs_diff_means_fat)

Finally, we plot the permutation results for each nutritional variable, with the red line indicating the observed difference in means.

In [ ]:
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
perm_data = [(perm_diff_calories, obs_diff_means_cal, 'Calories', p_val_calories),
             (perm_diff_sugar, obs_diff_means_sugar, 'Sugar (%DV)', p_val_sugar),
             (perm_diff_fat, obs_diff_means_fat,   'Total Fat (%DV)', p_val_fat),]

for ax, (perm, obs, label, pval) in zip(axes, perm_data):
    if (label == "Sugar (%DV)"):
        ax.hist(perm, bins=30, edgecolor='black', color='steelblue', alpha=0.8)
        ax.axvline(obs,  color='red',  linewidth=2, label=f'Obs. Difference = {obs:0.2f}')
        ax.axvline(-obs,  color='red',  linestyle='--', label=f'Mirror = {-obs:0.2f}')
        p_label = f'= {pval:.2f}'
        ax.set_title(f'{label}\np-value {p_label}')
        ax.set_xlabel('Permuted Difference in Means')
        ax.set_ylabel('Frequency')
        ax.legend()
    else:
        ax.hist(perm, bins=30, edgecolor='black', color='steelblue', alpha=0.8)
        ax.axvline(obs,  color='red',  linewidth=2, label=f'Obs. Difference = {obs:0.2f}')
        p_label = f'= {pval:.2f}'
        ax.set_title(f'{label}\np-value {p_label}')
        ax.set_xlabel('Permuted Difference in Means')
        ax.set_ylabel('Frequency')
        ax.legend()

plt.suptitle('Permutation Test: Null Distributions vs Observed Differences', 
             fontsize=13, y=1.02)
plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation: Based on the results of the permutation test:

  • Calories (p-value = 1.0): The observed difference in means is strongly negative, and opposite to the direction of the alternate hypothesis (honey > sugar). Therefore, we fail to reject the null hypothesis, as there is no statistical evidence that honey desserts contain more calories than sugar desserts.

  • Sugar (%DV) (p-value ≈ 0.0): The observed difference in means is negative, and the permutation test shows that such a difference is extremely unlikely under the null hypothesis, which suggests no difference. This indicates that the difference in sugar content between honey and sugar desserts is statistically significant.

  • Total Fat (%DV) (p-value = 1.0): Similar to calories, the observed difference is negative and opposite to the direction of the alternate hypothesis. As such, we fail to reject the null hypothesis, as there is no statistical evidence that honey desserts contain more fat than sugar desserts.

In conclusion, only sugar (%DV) shows a statistically significant difference, with honey desserts containing less sugar on average. The differences in calories and total fat appear in the opposite direction of the hypotheses and are not statistically significant. Although the directional hypotheses for calories and fat were not supported, the permutation test confirms a significant difference exists in the opposite direction, which is further analyzed in the next section.


Further Analysis: Does the Sweetener Alone Explain the Nutritional Gap?¶

Having completed hypothesis testing, we now conduct further analysis to better understand the underlying drivers of the nutritional differences observed between honey and sugar desserts. Specifically, we investigate whether the gap can be explained by the nutritional properties of the sweeteners themselves, or whether broader recipe-level factors and complexity cause the observed differences.

Given that honey-based desserts are consistently lower in calories, sugar, and fat compared to sugar-based desserts, we investigate whether these differences arise from the sweeteners themselves. If recipe authors merely substitute honey for sugar in otherwise identical recipes, the observed nutritional differences should roughly mirror what we would expect from a direct 1 tbsp swap. However, given the increase in sweetness, bakers typically use 1.25x the amount of sugar as compared to honey when adjusting the sweetener in recipes. To test this, we compare three scenarios for each metric:

  1. Theoretical (Unadjusted): The theoretical difference assuming a straight 1 tbsp honey for 1 tbsp sugar substitution
  2. Theoretical (Sweetness-Adjusted): The theoretical difference adjusted for honey's greater sweetness (where 1 tbsp honey replaces 1.25 tbsp sugar)
  3. Observed (Dataset): The actual observed mean difference between honey and sugar desserts in the dataset

All values are normalized to FDA % Daily Values to allow direct comparison across calories, sugar, and fat.

In [ ]:
fda_dv_calories = 2000
fda_dv_sugar_g = 50
fda_dv_feat_g = 78
sweetness_factor = 1.25

theoretical_unadj = {'calories (%DV)': ((64-48)/fda_dv_calories)*100, 
                     'sugar (%DV)': ((17-13)/fda_dv_sugar_g)*100,
                     'total_fat (%DV)': 0.0}

theoretical_adj = {'calories (%DV)': ((64-(48*sweetness_factor))/fda_dv_calories)* 100,
                   'sugar (%DV)': ((17-(13*sweetness_factor))/fda_dv_sugar_g)*100,
                   'total_fat (%DV)': 0.0}

honey_df = desserts[desserts['primary_sweetener'] == 'Honey']
sugar_df = desserts[desserts['primary_sweetener'] == 'Sugar']

observed_diff = {'calories (%DV)': ((honey_df['calories'].mean() - sugar_df['calories'].mean())/fda_dv_calories)*100,
                 'sugar (%DV)': honey_df['sugar (%DV)'].mean() - sugar_df['sugar (%DV)'].mean(),
                 'total_fat (%DV)': honey_df['total_fat (%DV)'].mean() - sugar_df['total_fat (%DV)'].mean()}

metrics = ['calories (%DV)', 'sugar (%DV)', 'total_fat (%DV)']
labels = ['Calories (% DV)', 'Sugar (% DV)', 'Total Fat (% DV)']

unadj_vals = [theoretical_unadj[m] for m in metrics]
adj_vals = [theoretical_adj[m] for m in metrics]
observed_vals = [observed_diff[m] for m in metrics]

comparison_df = pd.DataFrame({'Metric': labels, 'Theoretical (Unadjusted)': unadj_vals,
                              'Theoretical (Sweetness-Adjusted)': adj_vals, 'Observed (Dataset)': observed_vals})
comparison_df
Out[ ]:
Metric Theoretical (Unadjusted) Theoretical (Sweetness-Adjusted) Observed (Dataset)
0 Calories (% DV) 0.8 0.2 -4.392786
1 Sugar (% DV) 8.0 1.5 -20.300498
2 Total Fat (% DV) 0.0 0.0 -8.759691

Based on these three metrics, we can construct bar charts to decipher whether individual sweetener nutritional metrics actually infiltrate a recipe's nutritional metrics. Essentially, if the primary sweetener's nutritional metrics dominate the recipe, we would see very similar bar charts between the sweetness-adjusted theoretical metric and the observed metric.

In [ ]:
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
metric_labels = ['Calories (% DV)', 'Sugar (% DV)', 'Total Fat (% DV)']
metric_keys = ['calories (%DV)', 'sugar (%DV)', 'total_fat (%DV)']
colors = ['#f4a261', '#e9c46a', '#2a9d8f']

for i, (metric, label) in enumerate(zip(metric_keys, metric_labels)):
    ax = axes[i]
    vals = [theoretical_unadj[metric], theoretical_adj[metric], observed_diff[metric]]
    bar_labels = ['Theoretical\n(Unadjusted)', 'Theoretical\n(Sweetness-Adj.)', 'Observed\n(Dataset)']
    bars = ax.bar(bar_labels, vals, color=colors, edgecolor='black', width=0.5)
    ax.axhline(0, color='black', linewidth=0.8, linestyle='--')
    ax.set_title(label, fontsize=13)
    ax.set_ylabel('Difference in % DV (Honey - Sugar)' if i == 0 else '')
    ax.set_ylim(
        min(vals) * 1.3 if min(vals) < 0 else -1,
        max(vals) * 1.3 if max(vals) > 0 else 1
    )
    for bar in bars:
        height = bar.get_height()
        ax.annotate(f'{height:.2f}%',
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 4 if height >= 0 else -12),
                    textcoords='offset points',
                    ha='center', fontsize=10)
plt.suptitle('Theoretical vs Observed Nutritional Difference (Normalized to % DV)\n', fontsize=13, y=1.02)
plt.tight_layout()
plt.show()
No description has been provided for this image

Interpretation: For calories, the unadjusted and sweetness-adjusted theoretical differences predict honey desserts should be +0.80% and +0.20% DV higher respectively, both of which are small positive differences. However, the observed difference is -4.39% DV, a complete directional reversal. A similar pattern holds for sugar (%DV), where theory predicts +9.40% (unadjusted) and +3.10% (sweetness-adjusted), yet the observed difference is -20.30% DV, a much higher difference in the opposite direction. Most strikingly, for total fat, both theoretical predictions are exactly 0.00% since neither sweetener contains fat, yet the observed difference is -8.76% DV. Since fat content is entirely independent of sweetener choice, this divergence cannot be attributed to the substitution at all.

As such, when taken together, these results strongly suggest that the nutritional differences observed between honey and sugar desserts are not driven by the nutritional properties of the sweeteners themselves, but rather by broader recipe-level differences. This makes intuitive sense as lighter preparations such as granola and energy bars tend to use honey, while more calorie-dense desserts like cakes and cookies tend to use sugar.

While selecting recipes tagged as "desserts" was a systematic decision made to address this issue, this analysis reveals that there is a sub-bifurcation within desserts, with some being healthier than others (in particular those which are honey-sweetened).


Ethics¶

A. Data Collection¶

  • A.1 Informed consent: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

  • A.2 Collection bias: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

    The dataset reflects recipes uploaded by users on Food.com (formerly GeniusKitchen) and cleaned by researchers at UC San Diego. Although there is a vast range of dietary preferences stored as tags, we acknowledge that the data may skew towards certain cultures and demographics, inhibiting its ability to generalize to other use cases.

  • A.3 Limit PII exposure: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?

    We will only use recipe-level data (ingredients, tags, nutrition). We will not analyze usernames, reviews, or any identifiable user metadata. The analysis focuses on aggregate nutritional properties rather than individuals. Although the data does include user IDs, we will be getting rid of these during the wrangling process.

  • A.4 Downstream bias mitigation: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

B. Data Storage¶

  • B.1 Data security: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

    The dataset is publicly available on Kaggle and does not contain sensitive personal data. It will be stored locally and we intend to maintain any change history on GitHub, specifically for coursework.

  • B.2 Right to be forgotten: Do we have a mechanism through which an individual can request their personal information be removed?

  • B.3 Data retention plan: Is there a schedule or plan to delete the data after it is no longer needed?

    The dataset is publicly accessible and local copies will likely be retained only for the duration of the course project. We might upload aggregated data but this will be clearly marked and identifiable as different from the raw data.

C. Analysis¶

  • C.1 Missing perspectives: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?

    We intend to utilize a user's interpretation of features like 'healthy' and 'desserts' based on tags they assign to the uploaded recipe, which we acknowledged might be a biased perspective.

  • C.2 Dataset bias: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

    We will examine class balance (honey vs sugar desserts), ingredient labeling consistency (checking for membership instead of absolute comparison), and missing nutrition values. Filtering decisions (e.g., defining a “primary sweetener”) may introduce bias, which will be well-documented and reproducible in downstream analysis.

  • C.3 Honest representation: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?

    All summary statistics and visualizations will be reported with clear note of assumptions, uncertainty, and limitations. The intention is to mimic our data in the way best intuitively understandable.

  • C.4 Privacy in analysis: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?

    We will not analyze or display any user information which may be used to personally identify such individuals. We aim to keep all results at the recipe level.

  • C.5 Auditability: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

    As we are completing all relevant work through GitHub and the corresponding Jupyter Notebooks, we believe that results and discoveries should be easily accessible to those seeking to review what we have done later in the future. We value transparency, which we hope, will be visible through GitHub and their commit history.

D. Modeling¶

  • D.1 Proxy discrimination: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?

  • D.2 Fairness across groups: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?

  • D.3 Metric selection: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?

    Although we selected calories, sugar and fat as our operational definition for healthiness, we understand that there are many other factors that can be taken into account in regards to physical health and diet.

  • D.4 Explainability: Can we explain in understandable terms a decision the model made in cases where a justification is needed?

    Our analytical methods will rely primarily on descriptive statistics, visualization, and classical statistical tests, which allow us to clearly explain differences in outcomes. We will also document our filtering decisions, statistical assumptions, and analysis steps to ensure transparency and interpretability.

  • D.5 Communicate limitations: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

    When communicating our results, we aim to be clear that these statistics are based solely on the specific recipe which we have analyzed. Through such, we aim to avoid the perception that we are giving out advice regarding personal diets.

E. Deployment¶

  • E.1 Monitoring and evaluation: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?

  • E.2 Redress: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?

  • E.3 Roll back: Is there a way to turn off or roll back the model in production if necessary?

  • E.4 Unintended use: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?

    We would like to avoid instances in which our conclusions are interpreted as health recommendations. As such, we aim to portray our results from a statistical standpoint, rather than as a conclusion which should be served as a guideline for others.


Discussion and Conclusion¶

The aim of this project was to determine whether honey-sweetened desserts are meaningfully higher in calories, sugar, and total fat content than those that are sugar-sweetened, motivated by the aforementioned health halo effect.

We wanted to ensure that groups were independent of one another, therefore, we tested for independence through the use of a heatmap. We found strong correlations between calories and fat %DV and calories and sugar (%DV), with Pearson's r coefficients of 0.86 and 0.77 respectively. As for fat (%DV) and sugar (%DV), there was still demonstrated correlation, however, slightly lower, at a level of r=0.44. From this, we concluded that these desserts were dependent.

From the research that we conducted, we found that sugar demonstrated larger median values relative to those of honey in all categories measured. For caloric content, the observed mean in kcal was 278.9 to 186 from sugar and honey respectively. Similarly, when measuring %DV of sugar, we had medians of 93 to 74 %DV for sugar and honey; we had medians of 18 and 8 %DV for fat.

Additionally, we also conducted Welch's t-test, and a permutation test on the difference of means. From the Welch's t-test, we calculated a t-statistic of -12.2212 when comparing the mean calories in honey and sugar, showing us that we have a very large difference in means. We saw similar results when analyzing sugar and fat, with t-statistics of -6.9343 and -12.5780 respectively. Similarly, in our permutation test, we had calculated p-values of 1.0 when comparing differences in calories, ~0.0 when calculating for difference in sugar (%DV), and 1.0 when calculating for difference in %DV of fat. From this, we concluded that we should fail to reject our null for Calories, reject our null for Sugar (%DV), and fail to reject our null for Fat (%DV). In other words, in all three measured categories, the corresponding values measured higher in sugar only desserts compared to those of honey only desserts.

The most interesting finding came from the theoretical and observed comparison. Despite accounting for honey's greater sweetness relative to sugar, a direct substitution would predict only small positive differences in calories and sugar content, and no difference in fat content. However, the observed differences were large and in the opposite direction, including an unexplained −8.76% DV gap in total fat, a nutrient neither sweetener contains. This highlights that the nutritional differences between groups are primarily driven by recipe-level composition rather than the sweetener itself. Coherent with our understanding of the health halo effect, honey tends to appear in lighter preparations such as granola bars, energy bites, and fruit-based desserts, while sugar dominates richer baked goods like cakes, cookies, and brownies. As such, the sweetener is a marker of recipe style, not a determinant of nutritional quality. Any observed differences, therefore, arise from individual choice rather than inherent health benefits of honey.

This work has considerable limitations, which primarily arise due to the public data collection source. As the dataset is a repository of publicly available recipes on Food.com, there is a significant class imbalance of 40:1 in terms of sugar-sweetened to honey-sweetened desserts. Moreover, as recipes are user-defined, it is tough to standardize for serving sizes and daily values might not be accurate representations of nutritional metrics. Additionally, filtering by the "desserts" tag does include an element of ambiguity as different users might have different definitions of what they consider a dessert, with some even misclassifying their recipes. Finally, while we filtered recipes which had both honey and sugar, we acknowledge that using ingredient presence rather than raw quantity may not be an accurate representation of nutritional composition.

As an extension of this work, we would likely address these limitations by applying stricter data filtration techniques and including supplementary consumption data, such as the NHANES datasets. The study could also benefit from being narrowed down to specific dishes as focal points of comparison, such as cookies or cakes, which will isolate the fixed effects of using either honey or sugar having controlled for the dish type.