argus-cluster/specs/mvp/v3.8/v3.8_progress.md

49 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MVP v3.8 进展记录
## 2026-01-06
- 完成 v3.8 设计文档:`specs/mvp/v3.8/v3.8_design.md`
- 完成 v3.8 Serving API reference`specs/mvp/v3.8/v3.8_api.md`
- 完成 v3.8 TDD 开发计划:`specs/mvp/v3.8/v3.8_dev_plan.md`
- 完成 M0`configs/dev.yaml` 增加 `serving` 配置http_port=8000, proxy_location=HeadOnly, accelerator_type=H20
- 完成 M1ServingSpec 解析/宏替换/路径校验 + 单测(`src/mvp/py/argus/service/serving_spec.py`
- 完成 M2SQLite 新增 `serve_models`/`serve_events` + Db API + 单测(`src/mvp/py/argus/service/db.py`
- 完成 M3FastAPI Serving 管理 API + 单测(`src/mvp/py/argus/service/app.py`
- 完成 M4ServeClient 抽象 + LLMConfig builderdict 形态)+ 单测(`src/mvp/py/argus/service/serve_client.py``src/mvp/py/argus/service/serve_llm_config.py`
- 完成 M5Serving reconciler状态机 + 资源预检查 + mock 单测)(`src/mvp/py/argus/service/serving_reconciler.py`
### M6h1 真实集成)
- `argus-ray-node` 镜像补齐依赖:`ray[serve,llm]` + `gymnasium` + `dm-tree`(避免 `ray.serve.llm` 导入失败)
- 修复 Ray 2.49.2 兼容性问题:
- `LLMConfig` 不支持 `placement_group_config`,改为使用 `resources_per_bundle``src/mvp/py/argus/service/serve_llm_config.py`
- 远端 E2E
- `scripts/run_all_v38_serving.sh` 可跑通create → RUNNING → `/v1/models``chat/completions` → delete → DELETED
- 修复脚本中 `/v1/models` 解析的 bash heredoc 引号错误(`src/mvp/scripts/run_all_v38_serving.sh`
### M7WebUI - Serving
- WebUI 增加 Serving 页面:
- 列表:`/ui/serving`
- 创建:`/ui/serving/new`
- 详情/事件/缩放/删除:`/ui/serving/{model_key}`
- 单测覆盖:
- `src/mvp/py/tests/test_ui_serving.py`
### M8文档/验收)
- `src/mvp/README.md` 补充 v3.8 serving 端口与 E2E 脚本说明
### 环境探测h1 / head 容器)
> 目的:确认 Ray Serve LLM 依赖是否开箱即用,避免后续集成阶段才暴雷。
- `ray`:可用,版本 `2.49.2`
- `ray.serve`:可 importServe 基础可用)
- `ray.serve.llm`:当前不可 import
- 报错:`ModuleNotFoundError: No module named 'gymnasium'`
- 原因:`ray.serve.llm` 的导入链路会触发 `ray.rllib`,而 rllib 依赖 `gymnasium`
结论:
- v3.8 在实现阶段需要在 `argus-ray-node` 镜像中补齐 `ray[llm]`(推荐)或至少补齐 `gymnasium` 等必要依赖,确保 `from ray.serve.llm import ...` 可用。