Commit Graph

73 Commits

Author SHA1 Message Date
shaohuzhang1 35b662a52d
perf: Optimize document extraction for complex table files (#3116)
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-05-20 13:44:20 +08:00
CaptainB 8d503c8bf8 fix: update post_cell function to handle different newline characters in cell values
--bug=1054683 --user=刘瑞斌 【github#2831】知识库上传excel、应用编排文档内容提取节点中上传excel,单元格中有换行,导入后没有在一个单元格里显示 https://www.tapd.cn/57709429/s/1690232
2025-04-24 16:05:09 +08:00
shaohuzhang1 0c14306889
fix: Docx segmented font title recognition (#2949) 2025-04-22 14:51:45 +08:00
CaptainB 3b24373cd0 fix: handle line breaks in cell content for markdown table formatting
--bug=1054683 --user=刘瑞斌 【github#2831】知识库上传excel、应用编排文档内容提取节点中上传excel,单元格中有换行,导入后没有在一个单元格里显示 https://www.tapd.cn/57709429/s/1685274
2025-04-14 14:21:51 +08:00
CaptainB 560890f717 fix: limit chapter title length to 256 characters in pdf_split_handle.py
--bug=1054363 --user=刘瑞斌 【知识库】导入PDF文档,分段标题长度超长时,没有自动截断 https://www.tapd.cn/57709429/s/1681044
2025-04-07 10:54:59 +08:00
CaptainB 675adeeb63 fix: exclude macOS specific files from zip processing
--bug=1054264 --user=刘瑞斌 【知识库】QA问答对模式,导入在mac上压缩的zip文件,会出现2个乱码文档 https://www.tapd.cn/57709429/s/1681034
2025-04-07 10:37:06 +08:00
CaptainB 27bc01d442 fix: skip macOS specific metadata directories and files in zip parsing
--bug=1054264 --user=刘瑞斌 【知识库】QA问答对模式,导入在mac上压缩的zip文件,会出现2个乱码文档 https://www.tapd.cn/57709429/s/1679674
2025-04-02 16:06:36 +08:00
shaohuzhang1 9750c6d605
fix: garbled zip import file names (#2747) 2025-03-31 16:22:39 +08:00
shaohuzhang1 55cdd0a708
fix: Zip with title cannot be parsed (#2683) 2025-03-26 10:31:31 +08:00
shaohuzhang1 5ec94860b2
perf: Enhance Word parsing (#2612) 2025-03-19 12:04:43 +08:00
shaohuzhang1 e420a01e0d
fix: Enterprise WeChat docking sub application cannot output thinking process (#2489) 2025-03-04 19:31:49 +08:00
shaohuzhang1 8c45e92ee4
feat: The OpenAI interface supports the thought process (#2392) 2025-02-25 14:22:51 +08:00
CaptainB c524fbc0e4 fix: Fix excel merge cells header 2025-02-14 10:26:18 +08:00
CaptainB 89c08b4bb0 fix: Filter blank sheet
--bug=1052097 --user=刘瑞斌 【github#2196】【应用编排】应用对话的时候上传带空白sheet的表格会报错 https://www.tapd.cn/57709429/s/1653414
2025-02-11 15:17:24 +08:00
shaohuzhang1 f16f417bd5
fix: The knowledge base table file upload is missing a header (#2185) 2025-02-10 10:22:23 +08:00
wxg0103 b90995d3aa fix: defect of incorrect document names after importing CSV and docx files into the knowledge base
--bug=1052039 --user=王孝刚 【知识库】-压缩文件中包含csv、docx文件时,导入到知识库后,文档名称包含文件夹名称 https://www.tapd.cn/57709429/s/1651752
2025-02-08 16:00:57 +08:00
shaohuzhang1 a3d6083188
fix: XLS, XLSX, CSV file upload lost data (#2150) 2025-02-07 15:13:14 +08:00
wxg0103 c5585da57d feat: i18n 2025-01-14 09:46:21 +08:00
shaohuzhang1 a28de6feaf
feat: i18n (#2011) 2025-01-13 11:15:51 +08:00
shaohuzhang1 d9df013e33
fix: Part of the docx document is parsed incorrectly (#1981)
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-01-06 14:37:51 +08:00
shaohuzhang1 832b0dbd63
feat: Knowledge base import supports zip, xls, xlsx, and csv formats, while knowledge base export supports zip format (#1869) 2024-12-18 18:00:19 +08:00
CaptainB fb8b96779c fix: 处理某些pdf中不包括目录和内部链接不能完整导入的问题 2024-12-06 10:49:37 +08:00
CaptainB 7346ef6a2c fix: 过滤空白的sheet
--bug=1049943 --user=刘瑞斌 【文档内容提取】-上传的excel中sheet为空时报错 https://www.tapd.cn/57709429/s/1625062
2024-12-04 16:30:43 +08:00
shaohuzhang1 6b4cee1412
fix: 修复对话使用api调用无法响应数据 (#1755) 2024-12-04 14:19:37 +08:00
shaohuzhang1 b6c65154c5
fix: 修复子应用表单调用无法调用问题 (#1741) 2024-12-03 15:23:53 +08:00
shaohuzhang1 b8aa4756c5
fix: 修复工作流节点输出等问题 (#1716)
Some checks failed
sync2gitee / repo-sync (push) Has been cancelled
Typos Check / Spell Check with Typos (push) Has been cancelled
2024-11-29 19:26:16 +08:00
CaptainB 78cd949f43 fix: 修复上传xlsx里的图片没在文档提取中显示的问题 2024-11-29 16:28:34 +08:00
CaptainB 2a07a50a60 fix: 修复文档提取doc图片没有保存和展示的问题 2024-11-28 16:17:23 +08:00
CaptainB f638abdea2 fix: 修复文档提取doc图片没有保存和展示的问题 2024-11-28 15:07:21 +08:00
CaptainB 59f5c8ac76 fix: 修复文档提取报错没有显示的问题 2024-11-27 12:20:16 +08:00
CaptainB 64e8f4dc9f chore: 文档内容无法提取的时候输出错误信息 2024-11-22 17:56:07 +08:00
CaptainB e1df4b2857 fix: 处理PDF中出现 \0 字符报 Null characters are not allowed
--bug=1048190 --user=刘瑞斌 【知识库】- 上传PDF文档 报错  ,关联issue #1468 https://www.tapd.cn/57709429/s/1611070
2024-11-18 12:46:37 +08:00
CaptainB 10e53f08e2 feat: 高级编排支持文件上传(WIP) 2024-11-14 14:24:36 +08:00
CaptainB b57a619bdb feat: 高级编排支持文件上传(WIP) 2024-11-14 13:36:16 +08:00
shaohuzhang1 22d9fdc42f fix: 修复旧word文档图片无法正常识别 #1533 2024-11-06 14:20:10 +08:00
CaptainB 834ccaa35b refactor: PDF分段强制按字数限制
--bug=1047568 --user=刘瑞斌 【github#1363】pdf 文件高级分段默认分段长度为500,但生成的段落长度超过29000字符 https://www.tapd.cn/57709429/s/1600183
2024-10-29 11:44:37 +08:00
shaohuzhang1 83d97439e4 fix: 修复导入word文档,有的图片导入不进去 2024-10-28 17:44:11 +08:00
CaptainB 76f63642e5 fix: 修复导入csv空行没有过滤的问题
--bug=1047841 --user=刘瑞斌 【知识库】上传csv格式的表格模版,第一行标题导入后分段显示不全 https://www.tapd.cn/57709429/s/1597113
2024-10-24 11:13:26 +08:00
wxg0103 d5bbf48d01 style: 优化样式 2024-10-18 15:51:03 +08:00
Henry-Shaw 33d63c8efe
fix: 修复知识库上传旧版本docx文件后,图片未正常识别导入的问题 (#1382) 2024-10-16 14:39:52 +08:00
CaptainB e16e827028 fix: 处理文本前后的空白字符 2024-09-25 16:00:30 +08:00
CaptainB 6cacb5be71 fix: 处理不规范的pdf中前言部分没在目录中标识出来,导致不能正常识别的问题 2024-09-24 12:06:51 +08:00
shaohuzhang1 885ab5410a fix: 修复【知识库】语雀导出的word,导入知识库是空白的 #1148 2024-09-20 19:37:22 +08:00
shaohuzhang1 49efb185e0 fix: 修复【模型设置】使用应用baseurl创建模型报错 2024-09-20 18:48:10 +08:00
shaohuzhang1 fda0bcb5d6 fix: 修复知识库导出后再导入,有一部分内容会丢失 2024-09-20 16:20:08 +08:00
CaptainB 3e3b77e34d refactor: 处理纵向合并的单元格 2024-09-18 12:37:33 +08:00
CaptainB 746f587698 fix: 表格数据区分xls和xlsx 2024-09-12 10:49:31 +08:00
shaohuzhang1 b924958176 feat: 上传文档表格对支持xlsx文件单元格图片 2024-09-11 18:27:44 +08:00
shaohuzhang1 37445762b2 feat: 上传文档qa问答对支持xlsx文件单元格图片 2024-09-11 15:55:29 +08:00
shaohuzhang1 f9a76d7948
feat: 支持openai接口 #353 (#1128) 2024-09-09 14:47:25 +08:00