Unveiling the Performance Dynamics and Stability of Artificial Neural Network Models in Reinforcement Learning: A Comprehensive Examination of ANN Size Influence

doi:10.21203/rs.3.rs-3166622/v1

Download PDF

Research Article

Unveiling the Performance Dynamics and Stability of Artificial Neural Network Models in Reinforcement Learning: A Comprehensive Examination of ANN Size Influence

https://doi.org/10.21203/rs.3.rs-3166622/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

This study assesses the performance and stability of artificial neural network (ANN) models in reinforcement learning (RL) simulations. Various simulations are conducted using multiple environments to examine the impact of ANN size on RL performance. Evaluation factors include rewards per episode, average reward stability, highest reward attainment, outliers analysis, action count, computation/training time. The findings reveal interesting patterns in RL model performance. Intermediate ANN sizes demonstrate optimal performance, indicating that neither excessively large nor small networks yield the best results. Smaller ANN models outperform larger ones in simulations with consistently high rewards for each action. Conversely, larger ANN models excel in simulations with sparse and infrequent rewards, benefiting from their stability. Small ANN models are advantageous in simulations with consistent action patterns, while larger models struggle to generalize effectively in low-reward situations. Supporting the hypothesis, the study confirms that using small ANN models with sufficient computing power to remember recurring patterns is superior to generalizing with larger ANN models. This research enhances our understanding of RL model dynamics and provides insights for selecting appropriate ANN sizes based on simulation characteristics.

artificial intelligence

reinforcement learning

neural network

agent training

computing

1. The paper investigates the training of reinforcement learning agents on Atari game simulations. It demonstrates that different environments lead to diverse results based on the training time, the number of actions, and the episodes. The study emphasizes the agent's capability to learn and solve tasks through repeated runs.

2. The paper explores the impact of different sizes of artificial neural networks (ANNs) on the results. It concludes that the best average results are achieved with ANNs of intermediate size, rather than the largest or smallest ones.

3. The paper focuses on specific environments where it is possible to encode all the necessary information in a compact ANN. It highlights the differing densities required for solving a task and remembering how to solve it. Even if the agent stumbles upon the correct path by chance, it can still successfully accomplish the task.

Artificial Neural Networks (ANN) have emerged as powerful models in machine learning, mimicking the intricate neural connections of the human brain. With their ability to process and transmit information, ANNs learn complex patterns, making predictions, classifying data, and solving a wide range of problems. Their application extends to various domains, particularly in the field of Artificial Intelligence (AI). ANNs serve as fundamental frameworks for developing intelligent systems, exhibiting cognitive abilities similar to humans. They excel in computer vision, natural language processing, robotics, and recommendation systems. Deep Neural Networks (DNNs), composed of multiple layers, excel at capturing intricate relationships and extracting meaningful representations, resulting in breakthroughs in image recognition, speech synthesis, and natural language understanding. ANNs possess remarkable generalization capabilities, adapting to new data and making accurate predictions on unseen examples. Their adaptability and continuous improvement make them invaluable in dynamic and uncertain environments. ANNs are widely adopted in supervised, unsupervised, and reinforcement learning, enabling them to handle diverse problem domains and objectives. In this paper, we explore the influence of ANN size on AI training and performance, investigating optimal configurations to achieve superior results in different scenarios. [1-2] The size of an Artificial Neural Network (ANN) holds significant importance in the context of AI training, building upon the preceding discussion. The number of neurons and layers within an ANN greatly influences its learning capacity and overall performance. Larger ANNs, characterized by increased neuron and layer counts, exhibit enhanced representational power, enabling them to capture intricate patterns and relationships within the data. However, it is crucial to strike a balance, as excessively large ANNs can be prone to overfitting, where the network becomes excessively specialized in the training data and fails to generalize well to unseen examples. To address this challenge, regularization techniques such as dropout have been introduced. Dropout is a method that randomly deactivates a fraction of neurons during training, effectively preventing overfitting and enhancing the network's ability to generalize. By temporarily removing neurons, dropout fosters model robustness, compelling the network to rely on different sets of neurons for each training sample. This approach encourages the network to learn more diverse features, resulting in improved performance on unseen data. Furthermore, the choice of learning rate during training is another critical factor. Utilizing a low learning rate facilitates effective exploration of the action space, thereby mitigating instances where the ANN becomes trapped with exceedingly low overall rewards. Consequently, employing a judicious combination of appropriate ANN size, dropout regularization, and suitable learning rate holds paramount significance in achieving optimal performance and enabling effective exploration in AI training scenarios [5-6]. In the pursuit of finding a delicate balance between performance and the size of an Artificial Neural Network (ANN) in reinforcement learning (RL), the identification of the smallest working size that exhibits desirable performance characteristics holds immense potential for revolutionizing the AI landscape. This breakthrough discovery not only contributes to substantial resource savings but also enables the deployment of AI algorithms on resource-constrained platforms with limited computational power and memory. By unlocking the ability to achieve strong performance with minimal computational requirements, this advancement has the power to disrupt various domains, including embedded systems, mobile devices, and Internet of Things (IoT) applications. The widespread adoption of AI algorithms in previously unfeasible scenarios due to resource constraints becomes feasible. Consequently, this development opens up new avenues for efficient AI systems, leading to increased productivity and enabling breakthroughs in deploying AI in resource-limited environments [7-8].The pursuit of finding a delicate balance between performance and the size of an Artificial Neural Network (ANN) in reinforcement learning (RL) is a central motivation in our research. To achieve this, we introduce the A2C (Advantage Actor-Critic) algorithm as a crucial component in RL. A2C combines policy-based and value-based methods, enabling a fine-tuned balance between exploration and exploitation. In our experiments, we utilize A2C with a shared body for both the actor and critic networks. This shared architecture facilitates effective communication between the networks, improving performance and stability during learning. The critic network provides detailed feedback on the value of states, enhancing the decision-making process of the actor network without drastically altering its outputs. This setup ensures a delicate equilibrium between exploration and exploitation, leading to robust learning outcomes. Furthermore, A2C's suitability for environments with numerous input attributes makes it an excellent choice for tasks requiring complex sensory processing. By leveraging the benefits of A2C's shared body architecture, our research aims to uncover the intricate relationship between ANN size, AI training, and performance, ultimately providing insights and advancements that optimize the deployment of AI algorithms [9-10]. Atari simulations were chosen as the testbed for evaluating the performance of Artificial Neural Networks (ANNs) due to their significance in the field of AI research. These simulations provide a rich and diverse set of environments with varying rules and dynamics, offering a comprehensive evaluation platform for testing the capabilities of ANNs. Each Atari game presents unique challenges and scenarios, ranging from simple and intuitive tasks with immediate rewards to complex and strategic gameplay that demands long-term planning and decision-making. The diversity of Atari simulations allows us to examine how ANNs perform under various conditions, testing their ability to learn and generalize from different environments. Some simulations require quick reflexes and intuition, rewarding immediate actions and demonstrating the AI's capacity to respond in real-time scenarios. Others present more complex gameplay mechanics, challenging the AI's strategic thinking, long-term planning, and the ability to recognize patterns and make informed decisions. By exploring this wide range of simulations, our research aims to assess the adaptability and generalization capabilities of AI algorithms across different tasks and rule sets [11-14].Drawing inspiration from previous studies like "Human-level control through deep reinforcement learning" and "Playing Atari with Deep Reinforcement Learning," [24-25] we adopted a similar ANN structure with a larger hidden layer scaling down to the output layer. The overall size of the layers was adjusted to meet the specific requirements of the "rom" version of the simulation, which had an input shape of (125,) rather than the pixel input. Multiple episodes and simulations were conducted for each ANN size, ranging from 64-32-16-9 to 512-256-128-64-32-16-9, to comprehensively evaluate their performance. To ensure a diverse set of challenges, we carefully selected Atari games that often present vastly different setups of inputs. This selection allowed us to assess the adaptability, generalization capabilities, and overall performance of the ANNs across a range of scenarios. Additionally, recognizing the dynamic nature of these games, we employed an approach that involved training the networks after every single action with one epoch of training. This enabled the networks to continually learn and adapt to the evolving game dynamics and make more informed decisions. By adopting this experimental setup, we aimed to gain insights into the performance characteristics of ANNs across different sizes and their relationship with AI training and performance. The analysis of performance distribution across different ANN sizes revealed several key findings. Both smaller and larger ANN sizes tended to exhibit suboptimal performance. Smaller ANN sizes struggled to capture the complexity of the data, resulting in limited learning capacity and lower rewards. Conversely, larger ANN sizes initially demonstrated better performance due to their ability to store and process a larger volume of data. However, as the size increased beyond a certain threshold, the drawbacks of overfitting became more prominent, leading to a decline in performance. These findings align with the expected trend of a bell-like curve, where small-sized ANNs yield low rewards, larger-sized ANNs yield better rewards, and an optimal performance point exists somewhere in between. It highlights the importance of finding the sweet spot where the ANN's capacity matches the complexity of the problem at hand. In addition to examining performance distribution, we also investigated other factors that could contribute to the observed behavior. We analyzed the runtime for each episode, the number of actions taken per episode, the average reward per episode for specific ANN sizes, and the largest collected reward for episodes across different ANN sizes. By considering these factors, we aimed to uncover patterns and gain insights into the relationships between ANN size, runtime, action frequency, and reward outcomes. Interestingly, we discovered that the distribution of performance was not always as expected. In some cases, ANNs with the lowest number of neurons yielded surprisingly high results, challenging the conventional understanding. To explain such occurrences, we thoroughly examined all collected patterns of factors, aiming to unveil potential underlying mechanisms and provide a comprehensive understanding of the observed behavior [15-18]. The impact of simulation responsiveness on ANN performance is a crucial aspect to consider. We observed that simulations with a higher number of moves until rewards are given tend to require larger ANN sizes to achieve better performance. This correlation highlights the necessity for a larger capacity within the ANN to effectively capture complex patterns and strategies present in the environment. The relationship between the behavior of the agent, the type of rewards provided for each action, and the RL algorithm employed also plays a significant role. In systems where agents are rewarded for every single action, and the best policy is one that yields the next significant reward, a relatively simple equation can represent the optimal policy. Consequently, smaller ANNs can effectively handle these types of tasks. However, for more complex tasks that involve intricate policies and a more nuanced relationship between actions and rewards, larger and more complex ANNs become necessary. This observation holds particularly true for A2C-type algorithms, where discovering the best state and reward requires solving a complex set of actions. Although complex equations can be optimized and simplified, it may be possible to leverage the capacity of smaller ANNs to approximate the functions performed by larger ANNs effectively [19-23].The smaller ANNs have demonstrated superior performance for certain simulations, outperforming larger ANN sizes in highly responsive environments. To explain this behavior, we delved into the theory that small ANNs possess the perfect density to hold information rather than approximating it, unlike significantly larger ANNs. To test this theory, we conducted experiments focusing on the outliers of performance across all types and sizes of ANNs. We found that the ANNs with better performance had lower gradients to be applied, where the gradient represents the reward combined with the difference between V(t) and V(t+1) in A2C algorithms. This suggests that small ANNs can effectively hold information by discovering a successful policy through luck or chance and retaining it due to their suitable density. Additionally, their ability to forget less crucial information during the learning process contributes to their performance. This finding challenges the notion that larger ANNs are always superior in performance. It emphasizes the role of density and the balance between retaining crucial strategies and avoiding overfitting. By exploring the capabilities of small ANNs and their capacity to hold effective strategies, we gain valuable insights into the potential advantages of these compact architectures. [3-4]

The objective of this study is to evaluate the performance and stability of artificial neural network (ANN) models in reinforcement learning (RL) simulations. The study focuses on several test simulations using the following environments: "Asterix-ram-v0," "Alien-ram-v0," "Qbert-ram-v0," "Centipede-ram-v0," "Seaquest-ram-v0," "MsPacman-ram-v0," "Riverraid-ram-v0," "SpaceInvaders-ram-v0," and "CrazyClimber-ram-v0."

ANN Architecture

To assess the impact of ANN size on RL performance, a range of ANN architectures is utilized. The number of neurons in the ANN models varies from 112 to 1008, as shown in the "gridfig models."

Factors of Evaluation

The following factors are considered for evaluating the RL models:

1 Rewards per Episode (Shown in "grid fig rewards size correlation"): The rewards obtained per episode are measured to assess the performance of the RL models.

2 Average Reward Stability (Shown in "grid fig avg brit"): The frequency at which the average reward per episode is repeated or closely aligned (within ±10%) for different sizes of ANN models is analyzed to evaluate the stability of the RL models.

3 Highest Reward Attainment (Shown in corresponding grid): The occurrence of the highest reward or rewards close to the highest reward (within ±10%) for different ANN sizes is examined to determine the brittleness or stability of the RL models.

4 Outliers Analysis: Outliers, defined as episodes with exceptionally high or low rewards, are identified. The gradient applied as the loss function to the ANN models for critic/actor training is measured in the corresponding grid. The loss function used in this study incorporates the calculation of advantages, particularly in the A2C (Advantage Actor-Critic) methods. The advantage represents the estimated value of a state compared to the average expected value, considering the current policy. By calculating the advantage, the model can determine the suitability of different actions in a given state.

5 Action Count: The number of actions taken during each episode of the simulations is recorded and analyzed.

6 Computation/Training Time: The time required for computing and training the ANN models for each episode is measured.

7 ANN Density Testing: The density of the ANN models is tested to explore the hypothesis that very small ANN models (e.g., 112 neurons in all layers, 64 neurons in the largest layer) can solve simulations with high rewards. Multiple simulations are run to validate this hypothesis.

8 Runtime for Density Hypothesis: The runtime for each episode is recorded to evaluate the efficiency of the density hypothesis.

Experimental Settings:

The following experimental settings are used for the tests:

Learning Rate: 0.000025

Loss Function: Logarithmic calculation for the actor ANN's loss function after calculating the loss function for the critic.

Exploration vs. Exploitation: Actions where the actor ANN is uncertain (threshold under 0.5) are subjected to exploration, while actions with higher confidence are determined by the actor ANN's calculation.

The analysis of the collected data revealed several noteworthy patterns in the performance of the RL models:

1 Expected Distribution of Size-Collected Rewards: The results indicate that the best performance of the ANN models is achieved when the size of the network is neither too large nor too small. There is an observable trend of optimal performance at intermediate ANN sizes, as shown in the corresponding grid figure.

2 Outliers with Smaller ANN Outperforming Larger ANN: In certain simulations, interesting outliers were observed where the smallest ANN models outperformed their larger counterparts. These simulations were characterized by extremely large rewards given for almost every single action. The smaller ANN models demonstrated their efficacy in such scenarios, as demonstrated in the corresponding grid figures. Additionally, the grid showing the number of actions per episode provides further insights into the performance of the models.

3 Benefits of Larger ANN for Sparse and Infrequent Rewards: Simulations with sparse and infrequent rewards showed that larger ANN models tend to perform better. These simulations particularly benefited from the stability provided by larger ANN models, as seen in the average reward grid figure. The grid figures depicting maximum rewards further support this observation.

4 Consistent Action Patterns Favor Small ANN: Simulations where consistent action patterns were observed, regardless of changes in simulation interaction, benefited from small ANN models. The corresponding grid figures showcasing average rewards substantiate this finding.

5 Insights from Outliers: The examination of outliers confirms the hypothesis of density, where models with very small collected rewards struggle to generalize effectively. Further analysis reveals that small ANN models fail to benefit from larger gradients applied as the loss function because they do not exhibit the expected changes in their performance, as depicted in the outliers grid figure.

6 Testing Density Hypothesis: The study provides evidence supporting the hypothesis that solving RL problems using small ANN models, equipped with sufficient computing power to remember frequently repeating patterns, yields better results compared to generalizing with larger ANN models.

7 Runtime Comparison: A significant observation is that the runtime for simulations with small ANN models achieving huge rewards is nearly double that of simulations where larger ANN models achieve similar rewards. This finding highlights the trade-off between computational time and performance when considering ANN size.

The analysis of the collected data revealed several noteworthy patterns in the performance of the RL models. First, the results indicated that the optimal performance of the ANN models was achieved when the network size fell within an intermediate range, neither too large nor too small. This observation suggests that a balanced network size is crucial for maximizing performance across different simulations. Interestingly, certain simulations exhibited outliers where smaller ANN models outperformed larger ones. These outliers were characterized by scenarios where agents received exceptionally large rewards for almost every single action. The smaller ANN models demonstrated their efficacy in such contexts, showcasing their ability to excel in simulations with highly rewarding actions. On the other hand, simulations with sparse and infrequent rewards showed that larger ANN models tended to perform better. These simulations benefited from the stability provided by larger ANN models, as evidenced by the average reward grid figure. The maximum reward grid figures further supported this observation, highlighting the advantage of larger models in scenarios with limited rewards. Moreover, simulations where consistent action patterns persisted, irrespective of changes in simulation interaction, favored small ANN models. This finding suggests that smaller models can effectively exploit repetitive patterns, leading to more favorable average rewards. The examination of outliers provided insights into the density hypothesis. Models with very small collected rewards struggled to generalize effectively, failing to benefit from larger gradients applied as the loss function. This observation underscores the importance of computational power and the ability to capture expected changes in performance. The study also supported the density hypothesis, demonstrating that utilizing small ANN models with sufficient computing power to remember frequently repeating patterns yields better results than generalizing with larger ANN models. Furthermore, a significant observation was the trade-off between computational time and performance. Simulations with small ANN models achieving huge rewards took nearly double the runtime compared to simulations where larger ANN models achieved similar rewards. This highlights the need to consider computational resources when determining the appropriate ANN size.

In this study, we thoroughly assessed the performance and stability of artificial neural network (ANN) models in reinforcement learning (RL) simulations across various environments. Our evaluation encompassed factors such as rewards per episode, average reward stability, highest reward attainment, outliers analysis, action count, and computation/training time, revealing intriguing patterns in RL model performance. Optimal performance was observed with intermediate ANN sizes, emphasizing the need for a balanced approach between excessively large and small networks. Smaller ANN models outperformed larger ones in simulations with consistently high rewards, while larger models excelled in scenarios with sparse and infrequent rewards, capitalizing on their stability. Notably, we confirmed the hypothesis that utilizing small ANN models with sufficient computing power to remember recurring patterns is superior to relying on larger models for generalization. Therefore, when choosing the appropriate ANN size, considering whether the task exhibits repeating action patterns that remain effective regardless of states is crucial. This research significantly enhances our understanding of RL model dynamics and provides valuable insights for designing more efficient and effective ANN models in reinforcement learning, taking into account the computational trade-offs involved.

The authors have no relevant financial or non-financial interests to disclose.

Acknowledgement:

This publication is developed with the support of Project BG05M2OP001-1.001-0004 UNITe, funded by the Operational Programme “Science and Education for Smart Growth”, co-funded by the European Union trough the European Structural and Investment Funds.

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. In addition to the data availability statement mentioned above, short video simulations of the study can be accessed on an online repository. The videos, which provides a visual representation of the simulation, can be found at the following link: https://github.com/MartinKaloev/keeping_video_of_simulation/tree/main/update_NCAA_D_23_03667/github_animation.

Interested individuals can visit the link to gain a better understanding of the study and its findings.

Li, Yaofang, and Bin Wu. “Software-Defined Heterogeneous Edge Computing Network Resource Scheduling Based on Reinforcement Learning.” Applied Sciences, vol. 13, no. 1, Jan. 2023, p. 426. www.mdpi.com, https://doi.org/10.3390/app13010426.
Zhang, Peiying, et al. “Deep Reinforcement Learning Based Computation Offloading in UAV-Assisted Edge Computing.” Drones, vol. 7, no. 3, Mar. 2023, p. 213. www.mdpi.com, https://doi.org/10.3390/drones7030213.
Job, Megha S., et al. “Fractional Rectified Linear Unit Activation Function and Its Variants.” Mathematical Problems in Engineering, vol. 2022, 2022, pp. 1–15. ideas.repec.org, https://ideas.repec.org//a/hin/jnlmpe/1860779.html.
Khan, Khushboo Farid, et al. “Artificial Intelligence and Criminal Culpability.” 2021 International Conference on Innovative Computing (ICIC), 2021, pp. 1–7. IEEE Xplore, https://doi.org/10.1109/ICIC53490.2021.9692954.
Ali, Dr Atif. “Artificial Intelligence and Criminal Liability: Challenges in Articulation of Legal Aspects for Counter-Productive Actions of Machine Learning.” International Journal of Instructional Technology and Educational Studies, vol. 2, no. 3, Aug. 2021, pp. 13–20. ijites.journals.ekb.eg, https://doi.org/10.21608/ihites.2021.90657.1049.
Hossain, SK Safdar, et al. “Data-Driven Approach to Modeling Biohydrogen Production from Biodiesel Production Waste: Effect of Activation Functions on Model Configurations.” Applied Sciences, vol. 12, no. 24, Jan. 2022, p. 12914. www.mdpi.com, https://doi.org/10.3390/app122412914.
Ağgül, Burak, and Gökhan Erdemi̇r. “Development of a Counterfeit Vehicle License Plate Detection System by Using Deep Learning.” Balkan Journal of Electrical and Computer Engineering, vol. 10, no. 3, July 2022, pp. 252–57. dergipark.org.tr, https://doi.org/10.17694/bajece.1093158.
Aksu, Gökhan, et al. “The Effect of the Normalization Method Used in Different Sample Sizes on the Success of Artificial Neural Network Model.” International Journal of Assessment Tools in Education, vol. 6, no. 2, July 2019, pp. 170–92. dergipark.org.tr, https://doi.org/10.21449/ijate.479404.
Pinto, Lerrel, et al. “Robust Adversarial Reinforcement Learning.” Proceedings of the 34th International Conference on Machine Learning, PMLR, 2017, pp. 2817–26. proceedings.mlr.press, https://proceedings.mlr.press/v70/pinto17a.html.
Al-Nima, Raid Rafi Omar, et al. “Robustness and Performance of Deep Reinforcement Learning.” Applied Soft Computing, vol. 105, July 2021, p. 107295. ScienceDirect, https://doi.org/10.1016/j.asoc.2021.107295.
Morimoto, Jun, and Kenji Doya. “Robust Reinforcement Learning.” Neural Computation, vol. 17, no. 2, Feb. 2005, pp. 335–59. IEEE Xplore, https://doi.org/10.1162/0899766053011528.
Moos, Janosch, et al. “Robust Reinforcement Learning: A Review of Foundations and Recent Advances.” Machine Learning and Knowledge Extraction, vol. 4, no. 1, Mar. 2022, pp. 276–315. www.mdpi.com, https://doi.org/10.3390/make4010013.
Fischer, Marc, et al. Online Robustness Training for Deep Reinforcement Learning. arXiv, 22 Nov. 2019. arXiv.org, https://doi.org/10.48550/arXiv.1911.00887.
Pattanaik, Anay, et al. Robust Deep Reinforcement Learning with Adversarial Attacks. arXiv, 10 Dec. 2017. arXiv.org, https://doi.org/10.48550/arXiv.1712.03632.
Tang, Yunhao, and Shipra Agrawal. Exploration by Distributional Reinforcement Learning. arXiv, 21 June 2018. arXiv.org, https://doi.org/10.48550/arXiv.1805.01907.
Amin, Susan, et al. A Survey of Exploration Methods in Reinforcement Learning. arXiv, 2 Sept. 2021. arXiv.org, https://doi.org/10.48550/arXiv.2109.00157.
Raffin, Antonin, et al. “Smooth Exploration for Robotic Reinforcement Learning.” Proceedings of the 5th Conference on Robot Learning, PMLR, 2022, pp. 1634–44. proceedings.mlr.press, https://proceedings.mlr.press/v164/raffin22a.html.
Zhang, Qiming, et al. “Artificial Neural Networks Enabled by Nanophotonics.” Light: Science & Applications, vol. 8, no. 1, May 2019, p. 42. www.nature.com, https://doi.org/10.1038/s41377-019-0151-0.
Goser, Karl F. “Implementation of Artificial Neural Networks into Hardware: Concepts and Limitations.” Mathematics and Computers in Simulation, vol. 41, no. 1, June 1996, pp. 161–71. ScienceDirect, https://doi.org/10.1016/0378-4754(95)00068-2.
Grondman, Ivo, et al. “A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients.” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, Nov. 2012, pp. 1291–307. IEEE Xplore, https://doi.org/10.1109/TSMCC.2012.2218595.
Kumar, Harshat, et al. “On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation.” Machine Learning, Feb. 2023. Springer Link, https://doi.org/10.1007/s10994-023-06303-2.
Mnih, Volodymyr, et al. Asynchronous Methods for Deep Reinforcement Learning. arXiv, 16 June 2016. arXiv.org, https://doi.org/10.48550/arXiv.1602.01783.
Brockman, Greg, et al. OpenAI Gym. arXiv, 5 June 2016. arXiv.org, https://doi.org/10.48550/arXiv.1606.01540.
MNIH, Volodymyr, et al. Human-level control through deep reinforcement learning. nature, 2015, 518.7540: 529-533.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Unveiling the Performance Dynamics and Stability of Artificial Neural Network Models in Reinforcement Learning: A Comprehensive Examination of ANN Size Influence

Status:

Version 1

Abstract

Figures

Article Highlights

Introduction

Methodology

Results

Discussion

conclusion

Declarations

References

Additional Declarations

Status:

Version 1