Price Optimization in Brazilian Retail: An Approach Based on Reinforcement Learning and Macroeconomic Variables
Dynamic Pricing. Deep Reinforcement Learning. PPO. LSTM. Retail.
In retail markets characterized by volatility, seasonality, and high informational dimensionality, dynamic pricing plays a strategic role in revenue optimization and margin preservation. This dissertation investigates the application of Deep Reinforcement Learning (DRL) to dynamic pricing in the Brazilian retail context by proposing an integrated architecture that combines Proximal Policy Optimization (PPO) with a Long Short-Term Memory (LSTM) forecasting model. The problem is formulated as an augmented Markov Decision Process (MDP), incorporating microeconomic variables, macroeconomic indicators, and prospective demand signals into the agent’s state vector. The research is structured in three stages: (i) a Systematic Literature Review (SLR) following the PRISMA protocol, which identifies gaps related to the integration of exogenous macroeconomic variables and the lack of multidimensional evaluation; (ii) the development of a stochastic simulation environment calibrated with real-world data; and (iii) comparative experimentation against economically plausible baseline policies, including inflation-indexed adjustments based on official DIEESE indicators. The consolidated experimental configuration (V8) achieved accumulated revenue of R$ 142.043 million, outperforming the most competitive baseline by 0.9845% (+R$ 1.38 million), while maintaining operational stability and economically coherent price trajectories over 106 timesteps. The results validate the effectiveness of the forecast–control architecture in mitigating decision myopia and capturing marginal gains at scale, offering a formally grounded and reproducible framework for price management in high-complexity environments.