Commit Graph

3 Commits

Author SHA1 Message Date
chanzhi82020 31c17999b8
This PR introduces evaluation support designed specifically to track and benchmark applications built on the FastGPT platform. (#5476)
- Adds a lightweight evaluation framework for app-level tracking and benchmarking.
- Changes: 28 files, +1455 additions, -66 deletions.
- Branch: add-evaluations -> main.
- PR: https://github.com/chanzhi82020/FastGPT/pull/1

Applications built on FastGPT need repeatable, comparable benchmarks to measure regressions, track improvements, and validate releases. This initial implementation provides the primitives to define evaluation scenarios, run them against app endpoints or model components, and persist results for later analysis.

I updated the PR description to emphasize that the evaluation system is targeted at FastGPT-built apps and expanded the explanation of the core pieces so reviewers understand the scope and intended use. The new description outlines the feature intent, core components, and how results are captured and aggregated for benchmarking.

- Evaluation definitions
  - Define evaluation tasks that reference an app (app id, version, endpoint), test datasets or input cases, expected outputs (when applicable), and run configuration (parallelism, timeouts).
  - Support for custom metric plugins so teams can add domain-specific measures.

- Runner / Executor
  - Executes evaluation cases against app endpoints or internal model interfaces.
  - Captures raw responses, response times, status codes, and any runtime errors.
  - Computes per-case metrics (e.g., correctness, latency) immediately after each case run.

- Metrics & Aggregation
  - Built-in metrics: accuracy/success rate, latency (p50/p90/p99), throughput, error rate.
  - Aggregation produces per-run summaries and per-app historical summaries for trend analysis.
  - Allows combining metrics into composite scores for high-level benchmarking.

- Persistence & Logging
  - Stores run results, input/output pairs (when needed), timestamps, environment info, and app/version metadata so runs are reproducible and auditable.
  - Logs are retained to facilitate debugging and root-cause analysis of regressions.

- Reporting & Comparison
  - Produces aggregated reports suitable for CI gating, release notes, or dashboards.
  - Supports comparing multiple app versions or deployments side-by-side.

- Extensibility & Integration
  - Designed to plug into CI (automated runs on PRs or releases), dashboards, and downstream analysis tools.
  - Easy to add new metrics, evaluators, or dataset connectors.

By centering the evaluation system on FastGPT apps, teams can benchmark full application behavior (not only raw model outputs), correlate metrics with deployment configurations, and make informed release decisions.

- Expand built-in metric suite (e.g., F1, BLEU/ROUGE where applicable), add dataset connectors, and provide example evaluation scenarios for sample apps.
- Integrate with CI pipelines and add basic dashboarding for trend visualization.

Related Issue: N/A

Co-authored-by: Archer <545436317@qq.com>
2025-09-16 15:20:59 +08:00
Archer 13b7e0a192
V4.11.0 features (#5270)
* feat: workflow catch error (#5220)

* feat: error catch

* feat: workflow catch error

* perf: add catch error to node

* feat: system tool error catch

* catch error

* fix: ts

* update doc

* perf: training queue code (#5232)

* doc

* perf: training queue code

* Feat: 优化错误提示与重试逻辑 (#5192)

* feat: 批量重试异常数据 & 报错信息国际化

  - 新增“全部重试”按钮,支持批量重试所有训练异常数据
  - 报错信息支持国际化,常见错误自动映射为 i18n key
  - 相关文档和 i18n 资源已同步更新

* feat: enhance error message and retry mechanism

* feat: enhance error message and retry mechanism

* feat: add retry_failed i18n key

* feat: enhance error message and retry mechanism

* feat: enhance error message and retry mechanism

* feat: enhance error message and retry mechanism : 5

* feat: enhance error message and retry mechanism : 6

* feat: enhance error message and retry mechanism : 7

* feat: enhance error message and retry mechanism : 8

* perf: catch chat error

* perf: copy hook (#5246)

* perf: copy hook

* doc

* doc

* add app evaluation (#5083)

* add app evaluation

* fix

* usage

* variables

* editing condition

* var ui

* isplus filter

* migrate code

* remove utils

* name

* update type

* build

* fix

* fix

* fix

* delete comment

* fix

* perf: eval code

* eval code

* eval code

* feat: ttfb time in model log

* Refactor chat page (#5253)

* feat: update side bar layout; add login and logout logic at chat page

* refactor: encapsulate login logic and reuse it in `LoginModal` and `Login` page

* chore: improve some logics and comments

* chore: improve some logics

* chore: remove redundant side effect; add translations

---------

Co-authored-by: Archer <545436317@qq.com>

* perf: chat page code

* doc

* perf: provider redirect

* chore: ui improvement (#5266)

* Fix: SSE

* Fix: SSE

* eval pagination (#5264)

* eval scroll pagination

* change eval list to manual pagination

* number

* fix build

* fix

* version doc (#5267)

* version doc

* version doc

* doc

* feat: eval model select

* config eval model

* perf: eval detail modal ui

* doc

* doc

* fix: chat store reload

* doc

---------

Co-authored-by: colnii <1286949794@qq.com>
Co-authored-by: heheer <heheer@sealos.io>
Co-authored-by: 酒川户 <76519998+chuanhu9@users.noreply.github.com>
2025-07-22 09:42:50 +08:00
Archer 80a84a5733
Change embedding (#1463)
* rebuild embedding queue

* dataset menu

* feat: rebuild data api

* feat: ui change embedding model

* dataset ui

* feat: rebuild index ui

* rename collection
2024-05-13 14:51:42 +08:00