NutriSight
NutriSight is an AI-powered data-extraction system that automates the reading of nutrition tables from photos of food packaging, transforming them into structured, reliable data that feeds directly into Open Food Facts – the world’s largest open, community-driven food database. Open Food Facts is a non-profit initiative hosting information on more than four million products contributed by tens of thousands of volunteers across over 160 countries, all freely available under open-data and open-source licences. Its public APIs and datasets are reused by hundreds of food, health, and sustainability applications worldwide.
By combining computer vision, optical character recognition (OCR), and layout detection, NutriSight replaces one of the most time-consuming manual steps in that ecosystem – the transcription of nutrition tables – enabling contributors and partner apps to capture accurate nutrition data in seconds rather than minutes. Developed by Open Food Facts in collaboration with El CoCo, the AI model operates across languages and label formats, achieving high accuracy (precision 0.95 / recall 0.96) while remaining fully open-source, auditable, and privacy-preserving. Integrated into Open Food Facts’ mobile app, website, and annotation tools, NutriSight assists both expert contributors and everyday users in validating and enriching the shared database – ensuring that trustworthy, up-to-date nutrition information becomes available faster to everyone who depends on it.
By improving both the speed and accuracy of nutrition data capture, NutriSight strengthens the quality and reliability of the Open Food Facts database as it expands to reflect a constantly evolving food system. Rather than replacing human effort, it supports contributors and consumers in assigning data correctly within the appropriate schema, across different languages, regions, and packaging formats. In this way, NutriSight helps keep open food data accurate, current, and accessible for all who rely on it – from developers and researchers to citizens and policymakers – reinforcing transparency and fairness across the digital food ecosystem.
WHY THIS MATTERS
Reliable nutrition data is essential for building healthier, more transparent food systems. Every consumer app, scientific study, or public policy measure that compares, scores, or monitors food quality depends on accurate product-level information. Yet much of this data – while already digitised by producers – remains inaccessible in a unified, affordable form for those building public-interest food applications, leaving open, community-driven projects to fill the gap through the manual transcription of product labels into open-access databases. Across the global market, this information still begins its digital life as a photograph of a label, entered manually – one product at a time.
As new products appear daily in multiple languages and formats, keeping this information complete and correct has become a major challenge for Open Food Facts. NutriSight directly addresses a key aspect of this bottleneck – data capture – by using AI-based scanning to read and interpret nutrition tables on food packaging, allowing data to be added faster, more accurately, and in more languages than ever before.
Within the Open Food Facts ecosystem, NutriSight enhances both the quality and quantity of nutritional data available to all. Its integration supports contributors and partner organisations such as El CoCo, whose mobile app helps consumers in Spain and beyond make informed food choices. By replacing repetitive manual entry with assisted scanning, NutriSight allows annotators and app users alike to focus on verification and context rather than transcription.
These improvements extend far beyond the database itself. Richer, verified, and multilingual nutrition data enables downstream applications to provide more accurate Nutri-Scores, diet tracking, and sustainability insights. In turn, it empowers consumers to make healthier, evidence-based choices, and gives innovators a stronger foundation on which to build fair and transparent food-tech solutions.
By reinforcing the accuracy, inclusiveness, and openness of a shared digital commons, NutriSight strengthens the entire ecosystem of actors working toward a healthier and more trustworthy food system.
WHAT THE SOLUTION DOES
Behind every product barcode in Open Food Facts lies nutrition label that must be turned into structured, comparable, digital, data. Before NutriSight, this meant that volunteer annotators or partner applications had to conduct a simple but painstaking task: capturing the small print on packaging – calories, fats, sugars, salts – line by line, from photographs taken in supermarkets or kitchens around the world.
NutriSight changes that experience completely. When an image of a product’s nutrition table is uploaded, its content is scanned by an AI model trained to recognise layout patterns and extract all relevant nutrition values. After clicking on an “extract” button, the prediction appears on screen – pre-filled and highlighted – so that contributing annotators can simply verify, correct, or approve it instead of typing each value from scratch.
For the Open Food Facts community, this reduces one of the most repetitive and time-consuming steps of data entry while improving the consistency across thousands of contributors. The same assisted-scanning feature is also being prepared for integration into partner apps such as El CoCo, where consumers will be able to contribute new data as they shop: scanning an unrecognised product, confirming the detected values, and directly enriching the shared database.
This shift from manual transcription to assisted data validation makes participation faster, easier, and more accurate. It turns both expert annotators and everyday users into active contributors to a global public resource – one that reflects new products, packaging formats, and languages as they appear.
By bringing AI directly into the contributor workflow, NutriSight keeps humans firmly in control while ensuring that the benefits of digitalisation are shared fairly. It helps maintain a food data ecosystem that grows with the same speed and diversity as the food system itself — open, inclusive, and accessible to all who build or depend upon it.
HOW YOU CAN USE IT
Within Open Food Facts, NutriSight is directly embedded in the contributor workflow – intelligently analysing uploaded images and pre-filling nutrition values to reduce manual data entry. It also has the potential to be integrated as an enabler within partner applications such as El CoCo, where consumers can scan unrecognised products and contribute verified nutrition data in real time. Beyond Open Food Facts, the same model, dataset, and codebase are openly available for adaptation in other contexts – from food-tech startups and research teams to developers building transparency tools that require reliable extraction of nutrition information from packaging images. Its outputs are designed for several stakeholders engaging with food information in distinct ways:
For developers, researchers, and innovators: NutriSight provides a practical foundation for building or improving food-data solutions. Within Open Food Facts, the model operates behind the public API, automatically extracting and validating nutrition values from packaging images. Applications already connected to the database benefit immediately from more accurate and complete data.
For research and public-health institutions:
The multilingual NutriSight dataset and model outputs provide an open and verifiable resource for advancing research in nutrition data extraction and Document AI performance. The dataset represents the first public collection of annotated nutrition tables – large-scale, multilingual, and professionally verified. It serves as a valuable benchmark for the research community, supporting studies in layout recognition, model accuracy across languages and label formats, and broader investigations into data quality and transparency in the food domain.
By making its tools, data, and documentation openly accessible, NutriSight extends its reach beyond Open Food Facts – enabling others to strengthen accuracy, openness, and trust in how nutrition information is captured, shared, and reused.
.
DIGITAL RESPONSIBILITY IN PRACTICE
While Open Food Facts has long been guided by principles of openness, security, and privacy, the NutriSight project provided an opportunity to put specific aspects of the Digital Responsibility Goals (DRGs) into practice. Its most tangible contributions relate to data fairness, trustworthy AI, and human agency, where clear, demonstrable measures were applied during dataset creation, model development, and integration:
NutriSight’s dataset was built to reflect the diversity of real-world food packaging and labelling across markets. Images were sourced from the global Open Food Facts community and annotated by professional annotators through a two-step verification process, ensuring accuracy and consistency, with final validation by a machine-learning engineer. To mitigate potential bias, the team monitored performance across languages and label formats. By openly publishing the resulting dataset the project contributes to fair and equitable access to high-quality AI training data. This openness helps level the playing field between commercial and community actors in food data innovation.
The NutriSight model is built on LayoutLM v3, a state-of-the-art architecture for document understanding. Its design emphasises reproducibility and accountability: both the dataset and training pipeline are openly available on GitHub and Hugging Face, allowing others to review, replicate, and evaluate its performance.
Model outputs include confidence scores for each extracted nutrition value, enabling contributors to verify uncertain predictions before approval. Continuous benchmarking across languages and packaging types ensures that reliability is maintained as the dataset grows. By keeping its AI transparent and auditable, NutriSight provides a replicable example of responsible model development within an open-data ecosystem.
NutriSight automates the extraction of nutrition data, but human contributors remain central to the process. Every detected value is presented to the user for review, correction, or confirmation before being added to the Open Food Facts database. This human-in-the-loop design preserves agency, accountability, and trust – ensuring that automation supports, rather than replaces, human judgement.
CONTRIBUTION TO THE TOOLBOX
The NutriSight project contributes directly to the DRG4FOOD Toolbox through a set of open, reproducible, and verifiable technical resources – every model, dataset, and script is publicly accessible, documented, and can be independently tested. These outputs demonstrate how responsible AI can be applied in practice to improve the accessibility and quality of nutrition data, providing reusable tools for developers, researchers, and public-interest innovators.
- NutriSight Model (LayoutLM v3-based)
A publicly released model trained to extract nutrition values from packaging images, capable of recognising multilingual label layouts and returning structured data with confidence scores.
View nutrition extraction model on Hugging Face
- Annotated Multilingual Dataset
The first open dataset of professionally annotated nutrition tables, sourced from the global Open Food Facts community and covering diverse packaging layouts and languages. Designed for benchmarking document-AI performance and retraining models in other contexts.
View nutrient extraction dataset on Hugging Face
- Open Source Code and Training Scripts
All model training scripts, annotation tools, and demo utilities are available on GitHub, allowing developers to retrain, evaluate, or extend NutriSight for new data domains.
View source code on GitHub
- Integration within Open Food Facts (Robotoff Backend & API)
The model is deployed in production through the Open Food Facts Robotoff machine-learning backend and accessible via the Predict API, enabling real-time nutrition extraction from uploaded product images.
Explore the Open Food Facts Predict API
- Scientific Documentation and Evaluation Paper
A comprehensive technical paper describing the dataset, model architecture, and performance evaluation, published for transparency and reproducibility.
Read NutriSight paper on GitHub
Together, these contributions provide a complete, openly licensed framework for AI-assisted nutrition data extraction – from dataset and model to deployment and documentation. By aligning open science with responsible AI, NutriSight strengthens the DRG4FOOD Toolbox with practical, reusable assets that advance fairness, transparency, and data accessibility across the food system.
IMPACT AND OUTLOOK
By accelerating the extraction and validation of nutrition data, NutriSight strengthens one of the most widely used open resources in the global food ecosystem – Open Food Facts. The model’s integration into the contributor workflow reduces repetitive manual entry, enabling faster updates and higher data accuracy across millions of products. This, in turn, benefits hundreds of downstream applications that depend on Open Food Facts for nutrition transparency, sustainability insights, and consumer information.
Beyond immediate integration, the project’s open-source release of its model, dataset, and training pipeline marks a significant contribution to the wider research and innovation community. By publishing the first openly available, professionally annotated dataset of nutrition tables for AI training, NutriSight has lowered entry barriers for developers and researchers working on Document AI solutions in the food domain.
As the Open Food Facts ecosystem continues to evolve, NutriSight’s methods and resources are expected to support future innovations in automated product recognition, data validation, and multilingual label processing. The ongoing collaboration with partners such as El CoCo demonstrates how these tools can be extended into consumer-facing applications, allowing users not only to access but also to contribute to trustworthy food information systems.
In the broader DRG4FOOD context, NutriSight stands out as an enabling project: a concrete demonstration of responsible AI applied to open, verifiable data infrastructures. Its continued refinement and reuse will help ensure that open food data remains accurate, inclusive, and accessible – supporting a more transparent and equitable digital food system.
QUICK FACTS
- Funding
- DRG4FOOD Open Call #1
- Use case
- Consumer Food Choices
- Partners
- Start date
- Apr 2024
- End date
- Apr 2025
- Resources