Commit Graph

39 Commits

Author SHA1 Message Date
shaohuzhang1 4076988374 fix: 修复旧word文档图片无法正常识别 #1533
(cherry picked from commit 22d9fdc42f)
2024-11-07 16:06:15 +08:00
CaptainB 834ccaa35b refactor: PDF分段强制按字数限制
--bug=1047568 --user=刘瑞斌 【github#1363】pdf 文件高级分段默认分段长度为500,但生成的段落长度超过29000字符 https://www.tapd.cn/57709429/s/1600183
2024-10-29 11:44:37 +08:00
shaohuzhang1 83d97439e4 fix: 修复导入word文档,有的图片导入不进去 2024-10-28 17:44:11 +08:00
CaptainB 76f63642e5 fix: 修复导入csv空行没有过滤的问题
--bug=1047841 --user=刘瑞斌 【知识库】上传csv格式的表格模版,第一行标题导入后分段显示不全 https://www.tapd.cn/57709429/s/1597113
2024-10-24 11:13:26 +08:00
wxg0103 d5bbf48d01 style: 优化样式 2024-10-18 15:51:03 +08:00
Henry-Shaw 33d63c8efe
fix: 修复知识库上传旧版本docx文件后,图片未正常识别导入的问题 (#1382) 2024-10-16 14:39:52 +08:00
CaptainB e16e827028 fix: 处理文本前后的空白字符 2024-09-25 16:00:30 +08:00
CaptainB 6cacb5be71 fix: 处理不规范的pdf中前言部分没在目录中标识出来,导致不能正常识别的问题 2024-09-24 12:06:51 +08:00
shaohuzhang1 885ab5410a fix: 修复【知识库】语雀导出的word,导入知识库是空白的 #1148 2024-09-20 19:37:22 +08:00
shaohuzhang1 49efb185e0 fix: 修复【模型设置】使用应用baseurl创建模型报错 2024-09-20 18:48:10 +08:00
shaohuzhang1 fda0bcb5d6 fix: 修复知识库导出后再导入,有一部分内容会丢失 2024-09-20 16:20:08 +08:00
CaptainB 3e3b77e34d refactor: 处理纵向合并的单元格 2024-09-18 12:37:33 +08:00
CaptainB 746f587698 fix: 表格数据区分xls和xlsx 2024-09-12 10:49:31 +08:00
shaohuzhang1 b924958176 feat: 上传文档表格对支持xlsx文件单元格图片 2024-09-11 18:27:44 +08:00
shaohuzhang1 37445762b2 feat: 上传文档qa问答对支持xlsx文件单元格图片 2024-09-11 15:55:29 +08:00
shaohuzhang1 f9a76d7948
feat: 支持openai接口 #353 (#1128) 2024-09-09 14:47:25 +08:00
CaptainB 70f44b990c refactor: 格式规范的pdf通过目录来分段 2024-09-06 10:56:27 +08:00
CaptainB 57b15a8a7f feat: 知识库支持上传csv和excel
--story=1016154 --user=刘瑞斌 【知识库】-支持上传表格类型文档(Excel/CSV)按行分段 https://www.tapd.cn/57709429/s/1567910
2024-08-30 15:46:20 +08:00
shaohuzhang1 a9443a638c fix: 修复上传文档中后缀为PDF 不识别 2024-08-27 14:16:03 +08:00
CaptainB 2a87af6172 chore: 解析错误时输出错误原因 2024-08-22 10:43:48 +08:00
shaohuzhang1 00af530d27
chore: 解析错误时输出错误原因 (#996)
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
Co-authored-by: CaptainB <bin@fit2cloud.com>
2024-08-20 22:03:58 +08:00
CaptainB 17af603397 refactor: 优化pdf加载,修复部分pdf中文乱码的问题
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2024-08-20 16:58:04 +08:00
CaptainB 01d8204cb5 refactor: 逐页加载pdf, 图片类型单独保存成文件加载 2024-08-16 15:08:22 +08:00
CaptainB 0d59ab2be9 refactor: 使用lazy_load方式加载pdf
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2024-08-16 10:43:20 +08:00
CaptainB e266dd9d99 refactor: 支持解析pdf中的图片
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2024-08-15 20:53:44 +08:00
shaohuzhang1 b3c7120372
fix: 修复QA文件解析失败 (#933) 2024-08-06 14:47:28 +08:00
shaohuzhang1 22e192ed11
fix: 修复文档导入解析错误 (#570)
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2024-05-28 17:32:29 +08:00
shaohuzhang1 efe5a2b021
fix: 修复excel导入失败问题 (#554) 2024-05-27 16:06:14 +08:00
shaohuzhang1 e9a05b1255
fix: 修复qa知识库导入失败错误 (#536) 2024-05-24 17:59:02 +08:00
shaohuzhang1 28938104c0
* feat: 支持上传 Excel/CSV 类型的问答对 (#430)
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2024-05-23 18:57:49 +08:00
shaohuzhang1 86f500208f
feat: 支持上传html格式的文档 #364 (#518) 2024-05-23 14:19:18 +08:00
shaohuzhang1 1f916a5c3e
feat: 【知识库】docx支持图片上传 #69 (#267) 2024-04-26 18:03:02 +08:00
shaohuzhang1 8b31fd6b36 fix: 分段不支持类型的文件报错 2024-04-10 17:05:46 +08:00
shaohuzhang1 bd3f6e4a9b fix: word分段支持表格数据 2024-04-10 10:38:17 +08:00
shaohuzhang1 765c79ed9d fix: 修改分段正则,优化分段逻辑 2024-04-09 18:05:50 +08:00
shaohuzhang1 b038f12a52 fix: 上传文档大小扩大到100MB 2024-04-09 15:21:40 +08:00
shaohuzhang1 11d8c6f174
fix: 修改已知bug(#30)
* fix: 刷新公共访问链接后,客户端统计重置

* fix: 导出未提交的sql文件

* fix: 创建 MaxKB 在线文档的知识库,只能获取根地址数据,子地址数据无法获取
2024-04-02 19:32:04 +08:00
shaohuzhang1 16ab1f0eae
Pr@main@fix bugs (#27)
* fix: 优化word分段规则

* fix: 去除标题特殊字符

* fix: 对话重新生成问题

---------

Co-authored-by: wangdan-fit2cloud <dan.wang@fit2cloud.com>
2024-04-01 14:39:56 +08:00
shaohuzhang1 c55bb3f6e5
Pr@main@pdf (#23)
* feat: 分段API支持word,pdf

* fix: 通用型知识库支持上传 PDF/DOC 格式的文档#19

---------

Co-authored-by: wangdan-fit2cloud <dan.wang@fit2cloud.com>
2024-03-29 18:28:05 +08:00