Environment : Multi-objective Reinforcement Learning
Autonomous procurement · 3-task difficulty system · state → action → reward
step 0 Idle
Select task difficulty
Easy task
Obvious optimal decision
Most vendors activePrices near expectedClear winner
Expected score: ~0.85–0.95
Medium task
Trade-offs required
Some denialsCheap-but-slow vs fast-but-costlyNo obvious pick
Expected score: ~0.65–0.82
Hard task
No perfect answer
Most denyAll prices above budgetQuality vs cost conflict
Expected score: ~0.42–0.68
Vendor dynamics:
Procurement parameters
RL state — episode snapshot
Episode step
Vendors remaining
Deals closed
Best price
Budget headroom
Cumulative reward
0.000
Live action feed
Policy decision trace
Run the agent to see policy decisions.
Total
10
Active
0
Denied
0
Deals
0
#VendorPrice ₹Del.QualityReliabilityMarginStatusRating

Run the agent to generate results.