| 
						
					 | 
					
						
						
						
						
							
						
						
							9fc61ca82b
							
						
					 | 
					
						
						
							
							configure for group < 10 trace grpo training
						
						
						
						
						
						
					 | 
					
						2025-07-01 09:50:29 +08:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					| 
						
					 | 
					
						
						
						
						
							
						
						
							4295f30f9a
							
						
					 | 
					
						
						
							
							update direct stepwise train, pass trainning with mock one step trace duplicated 52 times; reward OK, curve improved as the step grows.
						
						
						
						
						
						
					 | 
					
						2025-06-30 15:06:38 +00:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					| 
						
					 | 
					
						
						
						
						
							
						
						
							5e632c53ac
							
						
					 | 
					
						
						
							
							temp save; temp hold on for rejection sample CoT; try direct RL with raw stepwise data first
						
						
						
						
						
						
					 | 
					
						2025-06-30 10:29:00 +00:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					| 
						
					 | 
					
						
						
						
						
							
						
						
							7f4fc8b05b
							
						
					 | 
					
						
						
							
							use --system for system prompt
						
						
						
						
						
						
					 | 
					
						2025-06-26 22:46:09 +00:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					| 
						
					 | 
					
						
						
						
						
							
						
						
							ee08da12c0
							
						
					 | 
					
						
						
							
							pass sample test web with custom orm
						
						
						
						
						
						
					 | 
					
						2025-06-26 18:02:13 +00:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					| 
						
					 | 
					
						
						
						
						
							
						
						
							8f168ecbef
							
						
					 | 
					
						
						
							
							test swift for qwen3 math
						
						
						
						
						
						
					 | 
					
						2025-06-25 17:00:47 +08:00 | 
					
					
						
						
							
							
							
						
					 |