Reinforcement Learning Approach for Efficient Inventory Policy in Multi-Echelon Supply Chain Under Various Assumptions and Constraints
Downloads
Background: Inventory policy highly influences Supply Chain Management (SCM) process. Evidence suggests that almost half of SCM costs are set off by stock-related expenses.
Objective: This paper aims to minimise total inventory cost in SCM by applying a multi-agent-based machine learning called Reinforcement Learning (RL).
Methods: The ability of RL in finding a hidden pattern of inventory policy is run under various constraints which have not been addressed together or simultaneously in previous research. These include capacitated manufacturer and warehouse, limitation of order to suppliers, stochastic demand, lead time uncertainty and multi-sourcing supply. RL was run through Q-Learning with four experiments and 1,000 iterations to examine its result consistency. Then, RL was contrasted to the previous mathematical method to check its efficiency in reducing inventory costs.
Results: After 1,000 trial-error simulations, the most striking finding is that RL can perform more efficiently than the mathematical approach by placing optimum order quantities at the right time. In addition, this result was achieved under complex constraints and assumptions which have not been simultaneously simulated in previous studies.
Conclusion: Results confirm that the RL approach will be invaluable when implemented to comparable supply network environments expressed in this project. Since RL still leads to higher shortages in this research, combining RL with other machine learning algorithms is suggested to have more robust end-to-end SCM analysis.
Keywords: Inventory Policy, Multi-Echelon, Reinforcement Learning, Supply Chain Management, Q-Learning
A. J. Clark and H. Scarf, "Optimal Policies for a Multi-Echelon Inventory Problem," Manage. Sci., vol. 50, no. 12, 2004.
H. Lee, V. Padmanabhan, and S. Whang, "Information distortion in a supply chain: the bullwhip effect," Manage. Sci., vol. 43, pp. 546–558, 1997.
H. Lee and S. Whang, "Decentralized multi-echelon supply chains: incentives and information," Manage. Sci., vol. 45, pp. 633–640, 1999.
T. Moyaux, B. Chaib-draa, and S. D'Amours, "Multi-agent coordination based on tokens,” Proc. Second Int. Jt. Conf. Auton. Agents Multiagent Syst. - AAMAS '03, 2003.
R. . Lancioni, "New Developments in Supply Chain Management for the Millennium," Ind. Mark. Manag., vol. 29, pp. 1–6, 2000.
C. Jiang and Z. Sheng, "Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system," Expert Syst. Appl., vol. 36, no. 3 PART 2, pp. 6520–6526, 2009, doi: 10.1016/j.eswa.2008.07.036.
D. Waters, Inventory control and management. Sussex: Wiley, 2003.
D. F. Pyke, D. J. Thomas, and E. A. Silver, Inventory and production management in supply chains, 4th ed. Taylor & Francis Group, 2017.
X. Guo, C. Liu, W. Xu, H. Yuan, and M. Wang, "A prediction-based inventory optimization using data mining models," in Proceedings - 2014 7th International Joint Conference on Computational Sciences and Optimization, CSO 2014, Oct. 2014, pp. 611–615, doi: 10.1109/CSO.2014.118.
S. Nahmias and T. L. Olsen, Production and Operations Analysis, 7th ed. Long Grove III: Waveland Pr, 2015.
S. K. Chaharsooghi, J. Heydari, and S. H. Zegordi, "A reinforcement learning model for supply chain ordering management: An application to the beer game," Decis. Support Syst., vol. 45, no. 4, pp. 949–959, 2008, doi: 10.1016/j.dss.2008.03.007.
I. Giannoccaro and P. Pontrandolfo, "Inventory management in supply chains: A reinforcement learning approach," Int. J. Prod. Econ., vol. 78, no. 2, pp. 153–161, 2002, doi: 10.1016/S0925-5273(00)00156-0.
A. Mortazavi, A. Arshadi Khamseh, and P. Azimi, "Designing of an intelligent self-adaptive model for supply chain ordering management system," Eng. Appl. Artif. Intell., vol. 37, pp. 207–220, 2015, doi: 10.1016/j.engappai.2014.09.004.
T. Van Tongeren, U. Kaymak, D. Naso, and E. Van Asperen, "Q-learning in a competitive supply chain," Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., pp. 1211–1216, 2007, doi: 10.1109/ICSMC.2007.4414132.
Q. Duan and T. Warren Liao, "Optimization of replenishment policies for decentralized and centralized capacitated supply chains under various demands," Int. J. Prod. Econ., vol. 142, no. 1, pp. 194–204, 2013, doi: 10.1016/j.ijpe.2012.11.004.
Z. Jemaí¯ and F. Karaesmen, "Decentralized inventory control in a two-stage capacitated supply chain," IIE Trans. (Institute Ind. Eng., vol. 39, no. 5, pp. 501–512, 2007, doi: 10.1080/07408170601180536.
P. Toktas-Palut and F. Ülengin, "Coordination in a two-stage capacitated supply chain with multiple suppliers," Eur. J. Oper. Res., 2011.
F. Strozzi, J. Bosch, and J. M. Zaldívar, "Beer game order policy optimization under changing customer demand," Decis. Support Syst., pp. 2153–2163, 2007.
M. Li and Z. Wang, "An integrated robust replenishment/production/ distribution policy under inventory inaccuracy," Int. J. Prod. Res., 2018.
A. Gupta, C. D. Maranas, and C. M. McDonald, "Mid-term supply chain planning under demand uncertainty: customer demand satisfaction and inventory management," Comput. Chem. Eng., vol. 24, 2000.
A. J. Smith, "Applications of the self-organising map to reinforcement learning," Neural Networks, pp. 1107–1124, 2002.
J. J. Rao, K. K. Ravulapati, and T. K. Das, "A simulation-based approach to study stochastic inventory-planning games," Int. J. Syst. Sci., vol. 34, no. 12–13, pp. 717–730, 2003, doi: 10.1080/00207720310001640755.
Marcus A. Maloof, "Incremental rule learning with partial instance memory for changing concepts," Proc. Int. Jt. Conf. Neural Networks, pp. 2764–2769, 2003.
A. Mahadevan, S., Marchalleck, N., Das, K.T., Gosavi, "Self-improving factory simulation using continuous-time average-reward reinforcement learning," Proc. 14th Int. Conf. Mach. Learn., pp. 202–210, 1997.
S. O. Kimbrough, D. J. Wu, and F. Zhong, "Computers play the beer game: Can artificial agents manage supply chains?," Decis. Support Syst., vol. 33, no. 3, pp. 323–333, 2002, doi: 10.1016/S0167-9236(02)00019-2.
A. Kara and I. Dogan, "Reinforcement learning approaches for specifying ordering policies of perishable inventory systems," Expert Syst. Appl., vol. 91, pp. 150–158, 2018, doi: 10.1016/j.eswa.2017.08.046.
H. Meisheri, V. Baniwal, N. N. Sultana, H. Khadilkar, and B. Ravindran, "Using Reinforcement Learning for a Large Variable-Dimensional Inventory Management Problem," 2020.
P. Tokta-Palut and F. Ülengin, "Coordination in a two-stage capacitated supply chain with multiple suppliers," Eur. J. Oper. Res., vol. 212, no. 1, pp. 43–53, 2011, doi: 10.1016/j.ejor.2011.01.018.
Z. Jemai and F. Karaesmen, "Coordination in a two-stage capacitated supply chain with multiple suppliers," IIE Trans. (Institute Ind. Eng., vol. 5, no. 39, pp. 510–512, 2007.
Steven D Whitehead and L.-J. Lin, "Reinforcement learning of non-Markov decision processes," Artif. Intell., vol. 73, pp. 271–306, 1995.
Deeplizard.com, "Train Q-learning agent with python - reinforcement learning code project," 2018. https://deeplizard.com/learn/video/HGeI30uATws (accessed Aug. 03, 2019).
A. Kara and I. Dogan, "Reinforcement Learning Approaches for Specifying Ordering Policies of Perishable Inventory Systems," Expert Syst. Appl., 2017, doi: 10.1016/j.eswa.2017.08.046.
Authors who publish with this journal agree to the following terms:
All accepted papers will be published under a Creative Commons Attribution 4.0 International (CC BY 4.0) License. Authors retain copyright and grant the journal right of first publication. CC-BY Licenced means lets others to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material for any purpose, even commercially).