Commit Graph

20 Commits

Author SHA1 Message Date
shaohuzhang1 fa1aee6c3d perf: Memory optimization (#4318) 2025-11-10 15:21:22 +08:00
CaptainB d147b794ce chore: replace split_text with smart_split_paragraph in pdf_split_handle.py 2025-10-27 14:23:42 +08:00
shaohuzhang1 d92dcd722b
fix: Add file name to prompt when processing images with doc (#4114)
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-09-25 18:51:21 +08:00
CaptainB 4c9756839a chore: normalize with_filter parameter to boolean in split handle files
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
--bug=1057879 --user=刘瑞斌 【知识库】高级分段中自动清洗功能未生效 https://www.tapd.cn/62980211/s/1727744
2025-07-10 15:06:19 +08:00
CaptainB aa901c7fc7 fix: update file URL paths to use relative references 2025-07-02 22:45:11 +08:00
CaptainB 0f1d57f0cb feat: enhance error logging for file processing in CSV, XLS, and DOC handlers 2025-06-30 12:49:50 +08:00
CaptainB 82a2203be6 fix: handle string type for limit and improve error logging in pdf_split_handle
--bug=1057493 --user=刘瑞斌 【知识库】上传文档,使用高级分段报错 https://www.tapd.cn/62980211/s/1720110
2025-06-30 12:47:47 +08:00
CaptainB d49f448a5f fix: correct image path replacement logic in zip_split_handle 2025-06-26 17:02:34 +08:00
CaptainB 37ac79dc5a feat: import File model in zip_split_handle for enhanced functionality
--bug=1057478 --user=刘瑞斌 【知识库】通用知识库上传ZIP文件,分段失败 https://www.tapd.cn/62980211/s/1719181
2025-06-26 16:56:28 +08:00
CaptainB e24a2001c5 feat: refine regex patterns in text_split_handle for improved comment detection
--bug=1057526 --user=刘瑞斌 【知识库】markdown文件导入知识库,分段详情中代码块展示异常 https://www.tapd.cn/62980211/s/1719131
2025-06-26 16:23:32 +08:00
CaptainB a73e0b10f9 refactor: replace logging with maxkb_logger for consistent logging across modules 2025-06-25 17:00:18 +08:00
CaptainB fe8f87834d refactor: replace logging with maxkb_logger for consistent logging across modules 2025-06-25 16:46:50 +08:00
wxg0103 c253e8b696 refactor: remove print 2025-06-24 15:30:42 +08:00
CaptainB 45908b91ff refactor: update dataset_id to knowledge_id in zip_split_handle.py and tools.py 2025-06-18 21:28:33 +08:00
CaptainB c0b770f41e refactor: update dataset_id to knowledge_id in zip_split_handle.py and tools.py 2025-06-18 21:15:53 +08:00
CaptainB 9a7281212d fix: update image URL paths to use OSS endpoints 2025-06-12 15:49:54 +08:00
wxg0103 b8b14884bd refactor: add application settings 2025-06-07 17:57:11 +08:00
wxg0103 93833849c1 refactor: file to oss
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
2025-06-06 11:42:31 +08:00
CaptainB c3581be9bd fix: rename image_name to file_name in zip_split_handle and remove workspace_id assignment in document 2025-05-13 12:47:59 +08:00
CaptainB 43bef216d5 refactor: reorganize file handling imports into a structured directory 2025-04-30 16:08:17 +08:00