All Blogs | My Site | Page 2 of 4

vrishbhanu28
- 14 min read

Stone Crushing Plant Optimization Using Reinforcement Learning

Stone-crushing plants are essential for producing aggregates for construction projects. One of the most important components of such plants is the cone crusher, which is used as a secondary or tertiary stage of crushing. The cone crusher determines the shape and size of the final products, as well as the energy consumption and throughput of the plant. However, cone crushers are often inefficient and suboptimal, leading to wasted resources and lower-quality outcomes. In this post, we will explore how we can use advanced techniques in deep reinforcement learning to optimize the performance of cone crushers and achieve significant improvements in power efficiency, quality, and throughput.

This post is based on the previous posts that we have published on our blog, where we have discussed the pre-requisites and background knowledge needed to understand this article such as fundamentals of a stone crushing plant, sensor data collection and preprocessing, and algorithms for preventive maintenance. We recommend that you read those posts before proceeding with this one. We hope that you will find this post informative and useful for your projects.

Why Reinforcement learning?

In earlier posts, we have discussed how we can use sensor data from the machines in the plant to train anomaly detection models for predictive maintenance. We can predict whether the machine is functioning properly or not by looking at the reconstruction loss in our auto encoder and if it is too high there might be something wrong with the machine. This is where the applications of classical machine learning and deep learning stop. Since we can only detect these anomalies. What if we want to optimize for quality, power efficiency, and throughput (tonnes per hour).

To achieve autonomous control to optimize these tasks, we need a different class of algorithms, Reinforcement learning.

How does Reinforcement learning work?

from freecodecamp.org

Let’s suppose that our reinforcement learning agent is learning to play Mario as an example. The reinforcement learning process can be modeled as an iterative loop that works as above:

The RL Agent starts in a situation, or “state” (S⁰), in the game environment, like Mario.
Based on this initial state (S⁰), the RL Agent decides on an action (A⁰) to take. For example, it might choose to move right. At the start, this decision is made randomly.
After the action (A⁰) is taken, the game environment changes and enters a new state (S¹). This could be a new frame in the game Mario.
The game environment then rewards the RL Agent (R¹). For instance, it might get a +1 reward because it’s still alive in the game.
This process of state-action-reward (S⁰, A⁰, R¹) continues in a loop until the game ends, either by reaching the destination or losing the game. This results in a sequence of states, actions, and rewards.
The main goal of the RL Agent is to maximize the total reward it receives.

In simpler terms, think of it like this: The RL Agent is playing a game of Mario. It starts the game (S⁰), makes a move (A⁰), and the game changes based on that move (S¹). The game then gives the RL Agent points (R¹) for staying alive. This continues until the game ends. The RL Agent aims to earn as many points as possible.

Cumulative discounted rewards

Gt: This represents the total reward that our RL agent expects to get starting from time t.
∑: This is the sum symbol, indicating that we’re adding up all the rewards the agent gets.
γ^k: This is the discount factor raised to the power of k. The discount factor, γ, is a number between 0 and 1. It determines how much less valuable future rewards are compared to immediate ones. The further in the future a reward is (the larger k is), the less valuable it is.
Rt+k+1: This is the reward the agent gets at time t+k+1.

So, the equation Gt = ∑γ^k * Rt+k+1 is saying that the total expected reward at time t (Gt) is the sum of all future rewards (Rt+k+1), but each future reward is discounted by γ^k.

In simpler terms, it’s a way for the RL agent to balance immediate rewards with future rewards. The agent wants to maximize this total expected reward. That’s its main goal in the game.

The agent tries to maximize the reward it's going to get in the next state plus all the rewards that it's going to get till termination But every subsequent reward is discounted by a discount rate. The agent tries to maximize the total expected reward but because of the discounting rate, it is encouraged to prioritize near-term rewards rather than long-term rewards.

We will look at how the agent maximizes the total expected reward later in the post.

Reinforcement learning and digital twins

Imagine a digital twin as a virtual doppelganger of a physical object or process. It mirrors the real-world counterpart’s behavior in real-time, providing insights into its past performance, current status, and even future trends. This can be a game-changer when it comes to troubleshooting and preemptive problem-solving.

A digital twin can serve as the perfect playground for this reinforcement learning agent. It can experiment, make mistakes, and learn in this virtual environment without any real-world risks or costs. The lessons learned here can then be applied in the real world, optimizing performance

The digital twin also serves as a validation tool for the results of reinforcement learning. It ensures that the behaviors learned by the agent lead to the desired outcomes when implemented in the real world.

Above is an illustration of a comprehensive review of the available literature on digital twins. from A digital twin to train deep reinforcement learning agent for smart manufacturing plants: Environment, interfaces and intelligence

A lot of the information in this post from now on will be from Xia et. al.(2020) but we will look at the application of similar techniques in the context of cone crushers.

Proposed digital twin system

Here we can see that the system has subsystems that operate on three levels:

Digital engine

"In the field of process system engineering, Reinforcement Learning has been applied to better solve some challenging optimal control problems like the dual adaptive control and more general nonlinear stochastic optimal control. Therefore, the Digital Engine is proposed to use Reinforcement Learning with deep neural networks, namely Deep Reinforcement Learning, to solve stochastic optimal control problems with uncertainties from either the highly complicated signals or processes"

In this context, the term “digital engine” refers to a reinforcement learning agent. This agent is designed to optimize operations within a physical cell by learning through interactions with a virtual cell. The learning process involves the agent taking state-action pairs and receiving corresponding rewards. The goal of the agent is to maximize these rewards. Since this training occurs within the virtual cell, it’s relatively inexpensive and more efficient than training with the physical system.

During the production phase, the digital engine plays a crucial role in determining the manufacturing strategy. This is achieved by feeding sensor data from the physical system to the digital engine as a state. The engine then outputs an action policy designed to maximize total cumulative rewards.

To ensure the digital engine stays updated and effective, it undergoes periodic training based on the data received from the physical system. This continuous learning and adaptation process allows the digital engine to effectively optimize operations within the physical cell.

Virtual Cell

"The virtual manufacturing cell(components in blue) accommodated by selected industrial simulation software, enabling the testing and commissioning of control logics and programs to be pushed to the physical plant."

The term “virtual cell” is essentially what we commonly refer to as a digital twin. It’s a simulation that accurately replicates the physical machine in a virtual environment. This includes modeling the physics of the various moving parts, sensors, processes such as heat and electromagnetic effects, and physical limits.

This precise simulation allows us to explore different scenarios through the digital engine, without incurring the additional costs associated with running the physical cell. It’s like having a sandbox where we can freely experiment and learn, without worrying about real-world constraints or repercussions.

Moreover, the virtual cell enables us to test and optimize our strategies before implementing them in the physical cell. This not only enhances efficiency but also reduces the risk of errors in the actual operation. It’s a testament to how digital technology is revolutionizing the way we approach and manage physical systems.

Physical cell

It is the physical manufacturing cell (components in orange) including sensors, PLC controllers, middleware control components and other actuators

The physical cell refers to the actual machine that we’re aiming to optimize. It’s equipped with sensors and controllers that play a crucial role in data collection and machine control. These components work in tandem with the digital engine, enabling us to make informed decisions based on real-time data from the machine.

This setup allows us to monitor the machine’s performance closely, identify potential issues, and make necessary adjustments promptly. The controllers enable us to implement the strategies devised by the digital engine directly onto the physical machine.

In essence, the physical cell and the digital engine work together as a cohesive unit. The physical cell provides the real-world platform for implementation, while the digital engine offers the computational power for optimization. This synergy is what drives the efficiency and effectiveness of our operations.

Digital Engine training process

The training process of the digital engine, which is a reinforcement learning agent, involves a systematic approach that includes primary training, secondary training, and production.

Primary Training: The initial stage of training takes place in a low-resolution version of the virtual cell. Since the virtual cell is a simulation, creating a low-resolution version is straightforward. This allows for rapid initial training of the digital engine, with multiple simulations running concurrently.
Secondary Training: Once the reward gains from the low-resolution virtual cells plateau, the training shifts to the full-resolution virtual cell. This ensures that the agent receives data similar to what it will encounter during runtime, enhancing the accuracy of its learning.
Production: After the digital engine has been sufficiently trained and it’s deemed safe for deployment, it’s time to optimize the real-world industrial plant. In this phase, the digital engine receives sensor data as a state and takes action in the physical environment to maximize the total cumulative reward.

Virtual cell construction

In the paper K.Xia et. al., they use Siemens process simulate to build the virtual cell. They made a virtual cell of a gripper robot to show these concepts.

Create CAD models: Cad models are created using software like Siemens NX,, DreamWorks, blender, etc. Siemens process simulate allows you to upload CAD models inside the software seamlessly.
Define component Kinematics: the gripper was composed of double crank mechanisms (parallelogram linkages) coupled
Link Components: The defined components were grouped and linked, and their relative translations or rotations were specified. The diagram linking each unit specifies their kinematic dependencies, as the relative translations and rotations can be numerically defined by arrowing two units.
Jog joints to verify kinematic definitions: Finally, the kinematic definitions were tested by jogging the joints. The gripper

For our application, we will have to repeat this process for the cone crusher.

Following the construction of the virtual cell, there are a series of logistical steps that need to be undertaken. These include modeling the Programmable Logic Controller (PLC), sensors, connections, and linking the digital engine with the virtual cell using these connections. However, these steps are highly application-specific and may not directly translate to every use case.

It’s worth noting that Siemens Process Simulate appears to facilitate a seamless integration of these components. This capability enhances the efficiency of the setup and ensures a smooth transition from the virtual to the physical environment.

Deep Reinforcement learning as the digital engine

In the world of manufacturing, we’re seeing a real game-changer with the advent of smart manufacturing systems. These systems are all about adaptability and intelligence. They’re designed to respond to a whole range of changes, from shifts in the environment to policy updates, system prognosis, and even the hunt for the optimal solution. One of the key players in this field is Deep Reinforcement Learning (DRL), a subset of artificial intelligence. DRL has been making waves due to its knack for keeping trial expenses to a minimum, making it a cost-effective choice for a variety of applications.

But here’s the catch - when you’re dealing with large-scale manufacturing processes, applying DRL can be a bit of a challenge. It can be costly and complex. That’s where Digital Twin technologies come in. These technologies, which include everything from simulation and sensors to industrial control and real-time communications, allow us to create a virtual environment for trials. This approach not only cuts costs but also reduces the risks associated with real-world trials. Plus, thanks to the compatibility of Siemens software and hardware products, the digital twin is ready to roll in industrial processes. The end result? A seamless integration of machine learning algorithms into the industrial manufacturing optimization process sets the stage for more efficient and effective operations.

Applying Reinforcement Learning to Crushing Plants

Up until now, we’ve been exploring concepts from K. Xia et. al., applied to a robotic grabber to demonstrate their effectiveness. Now, we’re going to shift gears and see how these concepts of digital twins and reinforcement learning can be applied to a stone-crushing plant.

While the environment in the paper is highly controlled, making it somewhat easier to model the machinery and its behavior, a stone-crushing plant will present a different set of challenges. Being outdoors, the plant operates in an environment that introduces a significant amount of random noise into the reinforcement learning environment. This makes the optimization problem considerably more complex, but it’s these real-world scenarios that truly test the robustness and adaptability of our concepts.

Environment

Vibration Sensor: Measures the frequency, peak magnitude frequency, and mean frequency, providing insights into the machinery’s vibration profile.
Acoustic Sensor: Captures the audio data which is converted into a melSpectrogram, offering a glimpse into the operational sound patterns of the machinery.
Cubicity Cameras: These devices monitor the 20 mm cubicity, 10 mm cubicity, and return feed cubicity, helping to keep track of the size and shape of the crushed stones.
Oil Data Sensors: By measuring the mean oil flow and mean oil temperature, they ensure the machinery operates within safe and optimal conditions.
Power Usage Sensor: Monitors the power usage of the machinery, a crucial factor for energy efficiency.

In our previous post on predictive maintenance, we developed a feature vector to represent the state of the system. Interestingly, we can use the same data to represent the state of both the virtual and physical cells in our current context.

However, to make the process more computationally efficient, we’re working with a multi-discrete environment space. What this means is that for each reading, we first determine the lowest and highest possible values. We then break these down into regular intervals, effectively discretizing them.

This approach allows us to handle a wide range of values without overloading our computational resources and we still end up with a highly descriptive observation space.

Action space

Eccentric Speed: This controls the speed of the motor connected to the eccentric bush in the cone crusher. Changes in the speed of this motor will change the way the rock is being crushed in the crushing chamber.
CSS/Shaft Position: This controls the height of the main shaft and thus the Closed Side Setting(CSS) in the cone crusher. Changes in this will affect the output size, the break up of the different sizes of aggregate, and most importantly the amount of recycled material. If the amount of recycled material is high then a larger portion of the feed gets crushed multiple times resulting in higher cubicity but more energy consumed.
Vibro Feeder Schedule: This controls the set-up time and the steady state motor speed of the vibro feeder. In an earlier post, we explained that the vibro feeder regulates the amount of material going inside the cone crusher from the stockpile. The cone crusher must be choke-fed to achieve maximum cubicity.
Grizzly Feeder Schedule: Similar to the vibro feeder schedule, the Grizzly feeder controls the amount of material inside the Jaw crusher.

Each of these actions can be adjusted to optimize the operation of the stone-crushing plant. The reinforcement learning system explores this action space to find the best combination of actions that maximizes the total cumulative reward.

Reward function

This is how the reward function will look like. It incorporates data related to cubicity, throughput (measured in tons per hour), and power consumption, all derived from the state of the environment. Each of these components is weighted by their corresponding coefficients in the reward function. If we wish to emphasize the impact of a particular component, we can simply increase its coefficient. This adjustment will give that component more significance in the optimization process, as it will contribute more substantially to the expected reward compared to the other components. This flexibility allows us to fine-tune the reinforcement learning system to focus on the aspects that are most important for our specific objectives.

For instance, if our goal is to maximize power efficiency, we would increase the value of ‘C’, which is the coefficient for power usage. A higher ‘C’ value means that greater power consumption results in a lower reward. Consequently, the optimization algorithm will strive to minimize the power consumption of the plant, aligning its actions with our objective of enhanced power efficiency.

Optimization

What is the Q value?

In reinforcement learning, the Q value refers to the expected sum of rewards of taking an action in a given state. It can be interpreted as being in a state S, taking an action A, and thereafter taking the action that gets you the maximum sum of rewards. This is a recursive equation as the Q function is used inside the function itself.

Temporal difference

This is the update rule for Q learning, where the q value is updated by taking the current Q value for the given state and action, adding the difference between the reward in the next state plus the discounted Q value of the next state and the current q value itself. This is called temporal difference as you're taking the difference between the target value (where we know the reward) and the current estimate of the Q value. This iteratively gets closer and closer to the true value as we keep exploring the environment.

What is the need for deep neural networks?

You may wonder why we need to use a neural network in these equations, as it seems to add another layer of complexity to the training process. However, it actually reduces the computational requirements. Notice that we have to select the maximum Q value from all the possible actions to estimate the target Q value. We also have to do this for every possible state in the environment to effectively fill the Q table. This is feasible when we have low-resolution environments, but if we have 200 actions and 20,000 possible states, it becomes almost impossible to compute the Q table.

Neural networks as Q-value estimators

To address this problem, we use a neural network to approximate the Q-function. We estimate the Q value for the current state using the Q network, and the target value is obtained by adding the reward received and the Q value from the target network for the next state. This way, we avoid having to iterate through all the actions and states to estimate Q values.

However, you may wonder why we need two networks instead of one. The reason is that using the same network to estimate the Q value for both the current and the next state results in an unstable target for the Q network. This is because the network weights are updated at every step, which changes the target for the next step as well. This creates a feedback loop that hinders the learning process. To solve this problem, we use two networks: one for the Q value estimation and one for the target estimation. The target network is only updated periodically, ensuring a more stable and consistent target.

This is the update rule for the double deep Q-learning. Most of the concepts are explained in this barring replay memory and updating the weights of the Q network and the target network.

Replay memory:

The replay memory is a data structure that stores the agent’s experiences as it interacts with the environment. Each experience is a tuple of (state, action, reward, next state, done), where the state is the observation of the environment, action is the action taken by the agent, the reward is the immediate reward received, and done is a flag indicating whether the episode has ended or not. The replay memory allows the agent to sample from its past experiences and use them to train the Q-network and the target network.

Loss function:

The Q-network is a neural network that takes a state as input and outputs the estimated Q-values for each possible action. The Q-network is trained by minimizing the mean squared error between the Q-values and the target values, which are computed using the Bellman equation. The target value for a state-action pair is the sum of the reward and the discounted Q-value of the next state, given by the target network. The target network is a copy of the Q-network that is updated less frequently. The loss function is optimized using stochastic gradient descent or a variant of it.

Updating the target network:

The target network is used to stabilize the learning process and avoid divergence. The target network is updated by copying the weights of the Q-network every N steps, where N is a hyperparameter. This ensures that the target values do not change too rapidly and that the Q-network can converge to a good approximation of the optimal action-value function

Putting it all together

We will apply double deep Q-Networks (DDQN) to optimize our reward function, which consists of three components: cubicity, throughput, and power usage. Each component has a corresponding coefficient that reflects its relative importance in the reward function. By tuning these coefficients, we can prioritize the optimization of certain aspects of the plant over others.

The state representations will be derived from the digital twin, as previously discussed. We will start with low-resolution versions of the digital twin to speed up the training process, and then gradually increase the resolution to match the virtual cell. Once the reinforcement agent is sufficiently trained, we will deploy it to the physical cell to improve the performance of the crushing plant in real-time.

Conclusion

In this blog post, we explored the potential of reinforcement learning to optimize stone-crushing plants. We motivated this problem by highlighting the inefficiencies and uncertainties in the crushing process. We reviewed some literature that applied digital twins for manufacturing systems, which are virtual replicas of physical machines that can be simulated and trained in different scenarios. We then applied these concepts to our crushing plant and explained how we can use state, action, and reward representations to model the environment, and how we can use Double Deep Q-Networks to learn an optimal policy that maximizes the expected reward. We emphasized that this approach is efficient because we can train the model in a simulation environment that can be parallelized, and then use the learned policy to control the physical plant to improve the quality and quantity of the output and reduce the power consumption.

One shot Multi speaker text to speech transformer using pretrained scaled speaker embeddings

Art Style Transfer using AC CycleGAN

Results

Stone Crushing Plant Optimization Using Reinforcement Learning

Contact
Information

One shot Multi speaker text to speech transformer using pretrained scaled speaker embeddings

Art Style Transfer using AC CycleGAN

Results

Stone Crushing Plant Optimization Using Reinforcement Learning

Contact Information

Contact
Information