欢迎访问《金属矿山》杂志官方网站,今天是 分享到:
×

扫码分享

金属矿山 ›› 2025, Vol. 54 ›› Issue (10): 191-200.

• 机电与自动化 • 上一篇    下一篇

融合强化学习和状态机的智能矿卡换道决策研究

程 宇1 谢丽蓉1 卞一帆1 杨志勇2 胡桂林2 闫 壮1   

  1. 1. 新疆大学电气工程学院,新疆 乌鲁木齐 830017;2. 新疆天池能源有限责任公司,新疆 昌吉 831100
  • 出版日期:2025-10-15 发布日期:2025-11-07
  • 通讯作者: 谢丽蓉(1969—),女,教授,博士研究生导师。
  • 作者简介:程 宇(2001—),男,硕士研究生。
  • 基金资助:
    新疆维吾尔自治区重点研发计划项目(编号:2023B01006);新疆维吾尔自治区重点实验室开放课题(编号:2025D04013)。

Research on Lane-Change Decision-Making for Intelligent Mining Trucks Using Integrated Reinforcement Learning and State Machines#br#

CHENG Yu1 XIE Lirong1 BIAN Yifan1 YANG Zhiyong2 HU Guilin2 YAN Zhuang1   

  1. 1. School of Electrical Engineering,Xinjiang University,Urumqi 830017,China;
    2. Xinjiang Tianchi Energy Company Limited,Changji 831100,China
  • Online:2025-10-15 Published:2025-11-07

摘要: 为提升露天煤矿智能网联矿卡的换道决策性能,提出了一种融合深度强化学习与有限状态机的换道决
策方法。首先,构建了一个双层决策框架,上层利用深度Q 网络生成初步换道决策,下层通过有限状态机进行安全性
约束。其次,引入双重网络和竞争网络结构优化DQN 性能,有效缓解了Q 值过估计问题。然后,基于Gipps 安全模型
设计了状态转移规则,动态评估换道间隙的安全性。最后,设计了一个多目标奖励函数,综合评价和引导换道行为。
在Highway-env 平台上进行试验,结果显示,在高交通密度场景下,融合方法换道成功率达81. 36%,相比单一DuDQN
换道成功率(50. 84%)显著提升,碰撞次数减少,行驶稳定性增强。此框架能有效提升决策安全性和效率,对于露天矿
运输换道决策具有一定的参考意义。

关键词: 智能网联矿卡 深度强化学习 有限状态机 换道决策 多目标奖励函数

Abstract: In order to improve the performance of lane-changing decision-making for intelligent network-connected mining
trucks in surface coal mines,this paper proposes a lane-changing decision-making method that integrates deep reinforcement
learning and finite state machines. First,a two-layer decision-making framework is constructed,where the upper layer utilizes
deep Q-networks to generate preliminary lane-changing decisions,and the lower layer performs security constraints through finite
state machines. Second,the dual network and competitive network structure are introduced to optimize the DQN performance,
which effectively alleviates the Q-value over-estimation problem. Then,a state transfer rule is designed based on the Gipps
security model to dynamically evaluate the security of the lane-changing gap. Finally,a multi-objective reward function is
designed to comprehensively evaluate and guide the lane changing behavior. Experiments are conducted on the Highway-env
platform,and the results show that the success rate of the fusion method for lane changing reaches 81. 36% in high traffic density
scenarios,which is significantly improved compared to a single DuDQN(50. 84%),with a reduced number of collisions
and enhanced driving stability. This framework can effectively improve the safety and efficiency of decision-making,and has
certain reference significance for the decision-making of open-pit mine transportation lane-changing.

Key words: smart grid-connected mining truck,deep reinforcement learning,finite state machine,lane change decision,
multi-objective reward function

中图分类号: