FastGPT/plugins/webcrawler
dependabot[bot] e7eb44650e
Some checks are pending
Build FastGPT images in Personal warehouse / get-vars (push) Waiting to run
Build FastGPT images in Personal warehouse / build-fastgpt-images (map[arch:amd64 runs-on:ubuntu-24.04]) (push) Blocked by required conditions
Build FastGPT images in Personal warehouse / build-fastgpt-images (map[arch:arm64 runs-on:ubuntu-24.04-arm]) (push) Blocked by required conditions
Build FastGPT images in Personal warehouse / release-fastgpt-images (push) Blocked by required conditions
chore(deps): bump js-yaml in /plugins/webcrawler/SPIDER (#5932)
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1)

---
updated-dependencies:
- dependency-name: js-yaml
  dependency-version: 4.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 12:00:33 +08:00
..
SPIDER chore(deps): bump js-yaml in /plugins/webcrawler/SPIDER (#5932) 2025-11-17 12:00:33 +08:00
deploy Python Sandbox (#4380) 2025-03-28 13:45:09 +08:00
searxng 1 (#3924) 2025-02-28 19:00:58 +08:00
.dockerignore 1 (#3924) 2025-02-28 19:00:58 +08:00
.gitignore 1 (#3924) 2025-02-28 19:00:58 +08:00
.searchxng.env perf: deploy docs; docker-compose (#5722) 2025-09-29 11:34:11 +08:00
Caddyfile 1 (#3924) 2025-02-28 19:00:58 +08:00
Dockerfile 1 (#3924) 2025-02-28 19:00:58 +08:00
README.md 1 (#3924) 2025-02-28 19:00:58 +08:00
docker-compose.yaml 1 (#3924) 2025-02-28 19:00:58 +08:00
searxng-docker.service.template 1 (#3924) 2025-02-28 19:00:58 +08:00

webcrawler

docker版快速部署

代码版部署

  1. 按照 https://github.com/searxng/searxng-docker 的方式处理docker
  2. 参考SPIDER文件夹下的.env.example添加.env文件
  3. 进入SPIDER文件夹进行pnpm install
  4. 回到根目录运行docker compose up -d

代码版开发

  1. 将docker-compose.yml中与SPIDER相关的部分注释掉nodeapp
  2. .env文件中的URL参照注释修改
  3. 注释掉启动puppteer部分里面指定浏览器地址的代码
  4. pnpm run dev

测试样例:

Auth的Bear Token记得填,也就是.env里的ACCESS_TOKEN

读取单页面(content以HTML形式返回)

http://localhost:3000/api/read?queryUrl=<url>

返回结构


{
    "status": 200,
    "data": {
        "title": "something here",
        "content": "something here"
    }
}
{
    "status": 400,
    "error": {
        "code": "MISSING_PARAM",
        "message": "缺少必要参数: query"
    }
}

搜索(content以HTML形式返回)

http://localhost:3000/api/search?query=<something>&pageCount=5&needDetails=true&engine=baidu
{
    "status": 200,
    "data": {
        "results": [
            {
                "title": "string",
                "url": "string",
                "snippet": "string",
                "source": "string",
                "crawlStatus": "string",
                "score": 0,
                "content": "string"
            }
        ]
    }
}
{
    "status": 400,
    "error": {
        "code": "MISSING_PARAM",
        "message": "缺少必要参数: query"
    }
}