Feeding a deep-learning system data from a depth-sensing camera could cut malnutrition in long-term care facilities

Designed to tackle the issue of estimating residents' nutritional intake, this encoder-decoder food network with depth-refinement (EDFN-D) peers at plates in three dimensions to cut out human subjectivity errors.

author avatar

10 Feb, 2022

Pulling data from a 3D camera, this model can estimate just how much food is left on a plate.

Pulling data from a 3D camera, this model can estimate just how much food is left on a plate.

Malnutrition, or the risk of it, affects as much as 54 per cent of older residents in long-term care facilities, according to a 2019 paper by Heather H. Keller and colleagues — with knock-on effects on morbidity, quality of life, cognitive faculties, and more. It’s also a problem that’s hard to solve: Facilities can cook healthy and nutritious meals, but there’s no guarantee residents will eat them — and monitoring each resident’s nutritional intake is a time-consuming and error-prone process.

Machine learning could, a team from the University of Waterloo has demonstrated, offer some relief by taking the problem of monitoring exactly how much food, and what types of food, residents are consuming at mealtimes — by processing images captured by a 3D depth-sensing camera, taking human subjectivity out of the equation while simultaneously cutting the time it takes to record consumption data.

Happy, empty plates

“Right now, there is no way to tell whether a resident ate only their protein or only their carbohydrates,” says first author Kaylen J. Pfisterer, who worked with Keller and colleagues on the problem. “Our system is linked to recipes at the long-term care home and, using artificial intelligence, keeps track of how much of each food was eaten to make sure residents are meeting their specific nutrient requirements.”

Traditionally, food intake at long-term care facilities in monitored by eye. Staff take a look at finished plates and estimate how much of the meal was consumed, fitting them into as few as four bins: Zero per cent, 25 per cent, 50 per cent, 75 per cent, and 100 per cent.

It’s an approach which suffers from subjectivity and which is notably error prone: A 2002 paper by Victoria Hammer Castellanos and Yvette N. Andrews found that just 44 per cent of estimations were accurate, dropping to 38 per cent if they weren’t recorded immediately.

Images of several different meal types, with the background and plate removed; each mean is shown in various stages of being consumed.The team built custom datasets, based on real meals served at long-term care facilities.

The answer, according to Pfisterer and colleagues: Removing the human element and turning the problem over to a computer, processing data from a depth-sensing camera and calculating with precision not only how much of the meal in total was consumed by how much of each food category — providing a far clearer look at the nutritional uptake of residents, without increasing the workload on staff.

The team admits it’s not the only possible approach, but claims it offers a range of benefits over alternatives when applied to long-term care facilities — including reduced cost and increased comfort compared to fitting every resident with wearable sensors, which would also require regular sterilization, and avoiding the issue of having residents feel watched, which is known to actively trigger a reduced intake.

Where and how much

The system, a deep convolutional encoder-decoder food network with depth-refinement (EDFN-D), developed by the researchers aims to answer the question of where the food is present on the plate and how much food there is, ignoring for now the problem of categorizing different food types — a problem the team says has been well-addressed elsewhere.

Where previous efforts at food segmentation, which identifies what parts of a plate are covered in food, rely on taking multiple images from a number or perspectives or a single image with a fiducial maker, the system detailed in the paper requires only a single camera — a 3D depth-sensing unit produced by Intel under its RealSense business arm.

Using the camera, the system was trained on the UNIMIB2016 food dataset then tested across two novel datasets created specifically for the project: One showing food types chosen from real-world long-term care facility menus in their standard form, and another using modified textures — food types that have been minced or pureed, for residents who need it.

A chart showing volume error rates for the standard and depth-enhanced versions of the EFDN for different amounts of food left on a plate; the depth-enhanced version appears superior.Adding depth enhancement to the network proved key in reducing error rates.

The camera’s ability to record depth proved key, avoiding the problem where some food types - such as tomato sauce - could be piled high or spread thinly, easily confusing a two-dimensional system into thinking there was more or less of the food present than was actually the case. It also proved able to dramatically improve the overall estimation, offering an error margin under 10 per cent — compared to 62 per cent under human observation.

The biggest problem: Salad. Made up of individual leaves and offering an overall low density, the same apparent volume of salad on each plate could represent a widely different ratio of food to air pockets — something the researchers suggest could be improved with a training dataset in which salad is better represented.

“While our proposed system is not error-free,” the team admits, "it is significantly more accurate than current LTC [Long Term Care] methods. With depth-refinement, our proposed EDFN-D removes […] subjectivity, operates on a continuous scale, and has a mean 3D percentage intake of estimation error of -4.2 per cent across both datasets and a mean volume intake error of 0.8mL on the modified texture foods dataset.

A series of plates with varying amounts of food on them, along with the segmentation images produced by the EFDN and EFDN-D networks alongside GC and GC-D networks.The team's EFDN-D approach offers improved accuracy across a range of food types and consumption levels.

“My vision,” says Pfisterer, “would be to monitor and leverage any changes in food intake trends as yellow or red flags for the health status of residents more generally and for monitoring infection control.”

The team’s work has been published under open-access terms in the journal Scientific Reports.


Kaylen J. Pfisterer, Robert Amelard, Audrey G. Chung, Braeden Syrnyk, Alexander MacLean, Heather H. Keller, Alexander Wong: Automated food intake tracking requires depth-refined semantic segmentation to rectify visual-volume discordance in long-term care homes, Sci Rep 12, 83. DOI 10.1038/s41598-021-03972-8.

Victoria Hammer Castellanos, Yvette N. Andrews: Inherent flaws in a method of estimating meal intake commonly used in long-term-care facilities, JAND Vol. 102, Iss. 6. DOI 10.1016/S0002-8223(02)90184-7.

Heather H. Keller, Vanessa Vucea, Susan E. Slaughter, Harriët Jager-Wittenaar, Christina Lengyel, Faith D. Ottery, Natalie Carrier: Prevalence of Malnutrition or Risk in Residents in Long Term Care: Comparison of Four Tools, J Nutr Gerontol Geriatr. Oct-Dec 2019 38(4). DOI 10.1080/21551197.2019.1640165.

10 Feb, 2022

A freelance technology and science journalist and author of best-selling books on the Raspberry Pi, MicroPython, and the BBC micro:bit, Gareth is a passionate technologist with a love for both the cutting edge and more vintage topics.

Stay Informed, Be Inspired

Wevolver’s free newsletter delivers the highlights of our award winning articles weekly to your inbox.