Mask R-CNN and GrabCut Algorithm for an Image-based Calorie Estimation System
Downloads
Background: A calorie estimation system based on food images uses computer vision technology to recognize and count calories. There are two key processes required in the system: detection and segmentation. Many algorithms can undertake both processes, each algorithm with different levels of accuracy.
Objective: This study aims to improve the accuracy of calorie calculation and segmentation processes using a combination of Mask R-CNN and GrabCut algorithms.
Methods: The segmentation mask generated from Mask R-CNN and GrabCut were combined to create a new mask, then used to calculate the calorie. By considering the image augmentation technique, the accuracy of the calorie calculation and segmentation processes were observed to evaluate the method's performance.
Results: The proposed method could achieve a satisfying result, with an average calculation error value of less than 10% and an F1 score above 90% in all scenarios.
Conclusion: Compared to earlier studies, the combination of Mask R-CNN and GrabCut could obtain a more satisfying result in calculating food calories with different shapes.
Keywords: Augmentation, Calorie Calculation, Detection
S. M. Fruh, "Obesity: Risk factors, complications, and strategies for sustainable long-term weight management,” Journal of the American Association of Nurse Practitioners, vol. 29, pp. S3–S14, 2017, doi: 10.1002/2327-6924.12510.
J. L. Hargrove, "Does the history of food energy units suggest a solution to ‘calorie confusion'?,” Nutrition Journal, vol. 6, pp. 1–11, 2007, doi: 10.1186/1475-2891-6-44.
D. Park, J. Lee, J. Lee, and K. Lee, "Deep learning based food instance segmentation using synthetic data,” 2021 18th International Conference on Ubiquitous Robots, UR 2021, pp. 499–505, 2021, doi: 10.1109/UR52253.2021.9494704.
Y. Liang and J. Li, "Deep Learning-based Food Calorie Estimation Method in Dietary Assessment,” arXiv, no. Jianhua Li, 2017.
K. Okamoto and K. Yanai, "An automatic calorie estimation system of food images on a smartphone,” MADiMa 2016 - Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, co-located with ACM Multimedia 2016, pp. 63–70, 2016, doi: 10.1145/2986035.2986040.
L. Zhou, C. Zhang, F. Liu, Z. Qiu, and Y. He, "Application of Deep Learning in Food: A Review,” Comprehensive Reviews in Food Science and Food Safety, vol. 18, no. 6, pp. 1793–1811, 2019, doi: 10.1111/1541-4337.12492.
R. D. Yogaswara, E. M. Yuniarno, and A. D. Wibawa, "Instance-Aware Semantic Segmentation for Food Calorie Estimation using Mask R-CNN,” 2019 International Seminar on Intelligent Technology and Its Applications (ISITIA), no. August, pp. 416–421, 2019, doi: 10.1109/ISITIA.2019.8937129.
P. Poply and A. J. Angel, "An Instance Segmentation approach to Food Calorie Estimation using Mask R-CNN,” in PervasiveHealth: Pervasive Computing Technologies for Healthcare, Oct. 2020, pp. 73–78. doi: 10.1145/3432291.3432295.
P. Poply and J. A. Arul Jothi, "Refined image segmentation for calorie estimation of multiple-dish food items,” in Proceedings - IEEE 2021 International Conference on Computing, Communication, and Intelligent Systems, ICCCIS 2021, Feb. 2021, pp. 682–687. doi: 10.1109/ICCCIS51004.2021.9397169.
X. Wu, S. Wen, and Y. ai Xie, Improvement of Mask-RCNN Object Segmentation Algorithm, vol. 11740 LNAI, no. October. Springer International Publishing, 2019. doi: 10.1007/978-3-030-27526-6_51.
Y. Liang and J. Li, "Computer Vision-based Food Calorie Estimation: Dataset, Method, and Experiment,” arXiv, 2017.
T. Ege, W. Shimoda, and K. Yanai, "A New Large-scale Food Image Segmentation Dataset and Its Application to Food Calorie Estimation Based on Grains of Rice,” MADiMa '19: Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management, no. October, 2019.
K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.
C. Rother, V. Kolmogorov, and A. Blake, "‘GrabCut' - Interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 309–314, 2004, doi: 10.1145/1015706.1015720.
P. J. Stumbo and R. Weiss, "Using Database Values to Determine Food Density,” Journal of Food Composition and Analysis, vol. 24, no. 8, pp. 1174–1176, 2011, doi: 10.1016/j.jfca.2011.04.008.
J. Yang, Y. Zhao, J. C. W. Chan, and C. Yi, "Hyperspectral image classification using two-channel deep convolutional neural network,” International Geoscience and Remote Sensing Symposium (IGARSS), vol. 2016-Novem, pp. 5079–5082, 2016, doi: 10.1109/IGARSS.2016.7730324.
J. Salau and J. Krieter, "Instance Segmentation with Mask R-CNN Applied to Loose-Housed Dairy Cows in A Multi-Camera Setting,” Animals, vol. 10, no. 12, pp. 1–19, 2020, doi: 10.3390/ani10122402.
B. Cheng, R. Girshick, P. Dollár, A. C. Berg, and A. Kirillov, "Boundary IoU: Improving Object-Centric Image Segmentation Evaluation,” pp. 15334–15342, 2021.
K. E. Koech, "Confusion Matrix for Object Detection.” https://towardsdatascience.com/confusion-matrix-and-object-detection-f0cbcb634157 (accessed Sep. 18, 2021).
V. M. Zolotarev, "Thresholding Classifiers to Maximize F1 Score Zachary,” Journal of Applied Spectroscopy, vol. 7, no. 5, pp. 503–506, 2014.
Copyright (c) 2022 The Authors. Published by Universitas Airlangga.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).