Content Site

WHFoods.info | Nutrition Reference Library

2,100+ page nutrition reference library rescued from the Internet Archive and rebuilt as a modern Astro static site. 135 food profiles, 355 recipes, 43 nutrient pages, and 15 meal plans with multi-source nutrient data merged from USDA, CIQUAL, and NIH databases. Ad-free, cookie-free, tracking-free.

View Live Site

Astro TypeScript Fuse.js Cheerio Schema.org

WHFoods.info | Nutrition Reference Library screenshot 1

WHFoods.info | Nutrition Reference Library screenshot 2

Pages

Foods

Recipes

Nutrients

The problem

Nutrition information online is scattered, ad-laden, and contradictory. Search for “salmon nutrition” and you get SEO-optimized listicles selling supplements, calorie counters with no context, and academic databases that require a biochemistry degree to parse. The original World’s Healthiest Foods research was different. It was evidence-based, detailed, and written for real people. But the site was stuck on aging WordPress infrastructure, and when that infrastructure failed, years of high-quality nutrition research started disappearing from the internet. The Internet Archive had snapshots. The clock was ticking.

What I built

WHFoods.info is a complete nutrition reference library rescued from the Internet Archive and rebuilt as a modern static site. It contains 135 food profiles with nutrient density ratings, 355 recipes with prep times and serving sizes, 43 nutrient reference pages showing which foods are richest in each nutrient, 15 condition-specific meal plans for diabetes, hypertension, osteoporosis, and more, plus 225 articles and 30 FAQs. The entire corpus of 2,100+ pages builds in roughly 35 seconds and ships with zero ads, zero cookies, and zero tracking scripts.

The recovery pipeline

The Internet Archive’s CDX API was the starting point. I queried it to discover every URL the Wayback Machine had captured for the original site, then built a rate-limited scraper with exponential backoff to pull down each snapshot without hammering Archive.org’s servers. The raw HTML went through a Cheerio-based parser that extracted content from the original page structure, stripping navigation chrome, ads, and layout markup while preserving the nutritional tables and cross-reference links that make the content valuable.

The Wayback Machine captured different pages at different times, so the same food page might exist in a 2019 snapshot and a 2022 snapshot with different nutrient data. I built a multi-era merging system that reconciles these differences, preferring the most recent data while flagging conflicts for manual review. A debranding rules engine then strips all references to the original organization, replacing them with neutral language so the preserved content stands on its own.

Architecture

The nutrient data pipeline is the most complex piece of engineering on the site. Each food profile draws from four databases: USDA FoodData Central for macronutrients and standard vitamins, CIQUAL 2025 for European reference values, the USDA Flavonoid Database for polyphenol data, and NIH Office of Dietary Supplements for recommended daily allowances and upper limits. When any two sources disagree by more than 10%, the merge engine flags the conflict. I resolve flagged values through hand-curated corrections backed by peer-reviewed sources.

The food pages themselves use a tab navigation system with sections for Nutrients, About, Recipes, Full Profile, Related Foods, Articles, FAQs, and References. Nutrient tables show raw vs. cooked comparison data, daily value percentages, and a rating system (Excellent, Very Good, Good) based on nutrient density per calorie.

Fuse.js powers a client-side fuzzy search that spans all 2,100+ pages across every content type: foods, recipes, nutrients, meal plans, articles, and FAQs. Type “vitamin C” and you get the nutrient page, the richest food sources, recipes that preserve it, and articles explaining absorption. Schema.org structured data marks up every page with Article, BreadcrumbList, and NutritionInfo types for rich search results.

The visual design uses a two-pillar color system: green tones for reference content (food profiles, nutrient pages) and brown tones for cooking content (recipes, meal plans). Typography pairs Georgia for headings with system sans-serif for body text to keep load times minimal while maintaining readability across dense nutrient tables.

Content architecture

The site manages 24 content types, all cross-linked. A food page for salmon links to its best nutrient scores, the recipes that use it, the meal plans it appears in, and the articles that reference it. A nutrient page for Omega-3 lists the richest food sources ranked by density. A meal plan for hypertension connects to the specific foods, recipes, and nutrients that support cardiovascular health. Every connection is bidirectional, so users can enter from any content type and navigate the full reference network.

This cross-referencing required careful data modeling to avoid circular dependencies and duplicate content. Each content type owns its primary data and references other types by ID. The Astro build resolves these references at compile time, generating the correct links and preview cards for each cross-reference. The result is a site where every page is a hub, not a dead end.

Testing and deployment

I built the data pipeline and content parsers using TDD methodology with Vitest. Real HTML fixtures from the Wayback Machine snapshots serve as test inputs, so the parser tests validate against actual archived markup rather than synthetic examples. This caught edge cases in the original site’s HTML that no amount of manual testing would have surfaced.

GitHub Actions handles CI/CD. Every push triggers a full build of all 2,100+ pages, runs the test suite, and deploys to cPanel via rsync. Cloudflare provides CDN caching and SSL termination.

What makes it different

No ads. No tracking. No cookies. No newsletter popups. No “accept cookies” banners. Just 2,100 pages of evidence-based nutrition research, permanently preserved and freely accessible. The nutrient data comes from four authoritative databases with automated conflict detection, not from a single source that might be wrong. Every food profile shows raw vs. cooked comparisons because cooking changes everything about a food’s nutrient profile. The meal plans target specific health conditions with specific foods and recipes, not generic “eat healthy” advice.

This is what happens when you treat nutrition content as a data engineering problem instead of a content marketing opportunity.

Like what you see?

I build tools that solve real problems. If you have an idea or a project that needs engineering, let's talk.

Get in Touch