Commit Graph

  • 9fc61ca82b configure for group < 10 trace grpo training master yuyr 2025-07-01 09:50:29 +08:00
  • 4295f30f9a update direct stepwise train, pass trainning with mock one step trace duplicated 52 times; reward OK, curve improved as the step grows. yuyr 2025-06-30 15:06:38 +00:00
  • 5e632c53ac temp save; temp hold on for rejection sample CoT; try direct RL with raw stepwise data first yuyr 2025-06-30 10:29:00 +00:00
  • 7f4fc8b05b use --system for system prompt yuyr 2025-06-26 22:46:09 +00:00
  • ee08da12c0 pass sample test web with custom orm yuyr 2025-06-26 18:02:13 +00:00
  • 8f168ecbef test swift for qwen3 math yuyr 2025-06-25 17:00:47 +08:00