swift_test

Author	SHA1	Message	Date
yuyr	9fc61ca82b	configure for group < 10 trace grpo training	2025-07-01 09:50:29 +08:00
yuyr	4295f30f9a	update direct stepwise train, pass trainning with mock one step trace duplicated 52 times; reward OK, curve improved as the step grows.	2025-06-30 15:06:38 +00:00
yuyr	5e632c53ac	temp save; temp hold on for rejection sample CoT; try direct RL with raw stepwise data first	2025-06-30 10:29:00 +00:00
yuyr	7f4fc8b05b	use --system for system prompt	2025-06-26 22:46:09 +00:00
yuyr	ee08da12c0	pass sample test web with custom orm	2025-06-26 18:02:13 +00:00
yuyr	8f168ecbef	test swift for qwen3 math	2025-06-25 17:00:47 +08:00