Environment : Multi-objective Reinforcement Learning

Select task difficulty

Easy task

Obvious optimal decision

Most vendors activePrices near expectedClear winner

Expected score: ~0.85–0.95

Medium task

Trade-offs required

Some denialsCheap-but-slow vs fast-but-costlyNo obvious pick

Expected score: ~0.65–0.82

Hard task

No perfect answer

Most denyAll prices above budgetQuality vs cost conflict

Expected score: ~0.42–0.68

Vendor dynamics:

Procurement parameters

Item

Expected price (₹/kg)

Max budget ₹/kg (hidden)

Quantity (kg)

Agent speed

RL state — episode snapshot

Episode step

—

Vendors remaining

—

Deals closed

—

Best price

—

Budget headroom

—

Cumulative reward

0.000

Live action feed

Policy decision trace

Run the agent to see policy decisions.

Total

Active

Denied

Deals

#	Vendor	Price ₹	Del.	Quality	Reliability	Margin	Status	Rating

Run the agent to generate results.