欢迎访问《金属矿山》杂志官方网站,今天是 分享到:
×

扫码分享

金属矿山 ›› 2025, Vol. 54 ›› Issue (10): 175-181.

• 机电与自动化 • 上一篇    下一篇

依托周期性重训练强化学习的矿卡车道保持算法

刘锦瑶1 谢丽蓉1 卞一帆1 安 毅1,2 杨志勇3,4 黄德启1   

  1. 1. 新疆大学电气工程学院,新疆 乌鲁木齐 830017;2. 大连理工大学控制科学与工程学院,辽宁 大连 116024;
    3. 新疆天池能源有限责任公司,新疆 昌吉 831100;4. 新疆露天矿智能生产与管控重点实验室,新疆 昌吉 831100
  • 出版日期:2025-10-15 发布日期:2025-11-07
  • 通讯作者: 谢丽蓉(1969—),女,教授,博士研究生导师。
  • 作者简介:刘锦瑶(2001—),女,硕士研究生。
  • 基金资助:
    新疆重点研发专项项目(编号:2023B01006);自治区重点实验室基金资助项目(编号:XJQY2007)。

Lane Keeping Algorithm for Mining Trucks Relying on Retraining Reinforcement Learning

LIU Jinyao1 XIE Lirong1 BIAN Yifan1 AN Yi1,2 YANG Zhiyong3,4 HUANG Deqi1   

  1. 1. School of Electrical Engineering,Xinjiang University,Urumqi 830017,China;
    2. School of Control Science and Engineering,Dalian University of Technology,Dalian 116024,China;
    3. Xinjiang Tianchi Energy Company Limited,Changji 831100,China;
    4. Xinjiang Key Laboratory of Intelligent Production and Control of Open Pit Mines,Changji 831100,China
  • Online:2025-10-15 Published:2025-11-07

摘要: 为解决自动驾驶矿用卡车在矿山复杂环境下易失去对先前策略适应能力的难题,提出了一种考虑样本
重训练的深度强化学习车道保持控制算法。首先,通过考虑目标网络更新参数的特性,推导出一种周期性经验抽取
重训练模型,将重训练回合间隔纳入到传统目标网络更新参数模型中。然后,为避免噪声对模型的影响,将经验回放
缓冲区设置在较小的抽样范围内,噪声和不相关的经验对模型的影响会被降低,增强极端运行条件下的系统鲁棒性。
最后,考虑到露天矿山典型十字形道路,在CARLA 中设定车辆位置于十字路口,采用固定回合数下得到的平均奖励
作为模拟的关键性能指标进行仿真试验。试验结果表明,提出的周期性重训练深度Q 网络(PR-DQN)策略有效减少
了训练过程中的波动,使模型更快收敛,有效提升模型在非平稳环境任务中的性能,在稳定性和泛化能力上表现出显
著优势。

Abstract: In order to solve the problem that the self-driving mining truck is easy to lose the ability to adapt to the previous
strategy in the complex environment of the mine,a deep reinforcement learning lane keeping control algorithm considering
sample retraining is proposed. Firstly,by considering the characteristics of the target network update parameters,a periodic experience
extraction retraining model is derived,and the retraining round interval is incorporated into the traditional target network
update parameter model. Then,in order to avoid the influence of noise on the model,the experience playback buffer is set
in a smaller sampling range. The influence of noise and unrelated experience on the model will be reduced,and the system robustness
under extreme operating conditions will be enhanced. Finally,considering the typical cross-shaped road of open-pit
mine,the vehicle position is set at the crossroads in CARLA,and the average reward obtained under the fixed number of rounds
is used as the key performance index of the simulation. The experimental results show that the proposed periodic retraining deep
Q network (PR-DQN) strategy effectively reduces the fluctuation in the training process,makes the model converge faster,effectively
improves the performance of the model in non-stationary environment tasks,and shows significant advantages in stability
and generalization ability.

中图分类号: