优化爬虫:增量爬取、API参数优化、Excel兼容
主要变更: 1. 重命名 ygp_crawler.py -> main.py 2. API参数优化: - tradingProcess 固定传 "513,2C52,3C52" 精准筛选中标结果 - pageSize 固定为 50 提高抓取效率 - 通过 publishStartTime/publishEndTime 传入时间范围 3. 默认查询最近3个月(原为当天) 4. 增量爬取改为默认开启(移除 -i 参数) 5. CSV文件添加 UTF-8 BOM,Excel可直接打开 6. 更新 README.md 文档 7. 添加前端 JS 代码参考文件到 assets/ 目录 使用方法: - 增量更新:python main.py - 全量查询:rm results.csv && python main.py Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
4a458a897b
commit
5acb847bc1
43
README.md
43
README.md
@ -4,14 +4,15 @@
|
|||||||
|
|
||||||
## 功能特性
|
## 功能特性
|
||||||
|
|
||||||
- **关键字过滤**:自动筛选标题中包含“中标结果”的公告。
|
- **精准筛选**:通过 `tradingProcess` 字段直接筛选中标结果,无需标题关键词过滤。
|
||||||
- **日期过滤**:支持指定开始和结束日期,默认为抓取当天数据。
|
- **日期范围查询**:支持指定开始和结束日期,**默认为最近3个月**。
|
||||||
- **自动分页**:自动处理多页数据抓取。
|
- **高效分页**:每页 50 条记录,自动处理多页数据抓取。
|
||||||
|
- **API 时间过滤**:时间参数直接通过接口 `publishStartTime` 和 `publishEndTime` 传入,减少无效数据传输。
|
||||||
- **动态构造 URL**:根据接口返回字段自动生成可直接访问的详情页链接。
|
- **动态构造 URL**:根据接口返回字段自动生成可直接访问的详情页链接。
|
||||||
- **纯 HTTP 请求**:直接使用 aiohttp 调用官方 API,无需浏览器,轻量高效。
|
- **纯 HTTP 请求**:直接使用 aiohttp 调用官方 API,无需浏览器,轻量高效。
|
||||||
- **CSV 实时保存**:数据实时保存到 CSV 文件,同时输出到终端。
|
- **CSV 实时保存**:数据实时保存到 CSV 文件(带 UTF-8 BOM,Excel 可直接打开),同时输出到终端。
|
||||||
- **自定义输出路径**:支持通过参数指定输出文件路径。
|
- **自定义输出路径**:支持通过参数指定输出文件路径。
|
||||||
- **增量爬取**:支持从已有 CSV 文件自动计算时间范围,只抓取新数据,避免重复。
|
- **增量爬取**:自动从已有 CSV 文件计算时间范围,只抓取新数据,避免重复。如需全量查询,删除 CSV 文件即可。
|
||||||
- **数据排序**:新数据在前,旧数据在后,按发布时间倒序排列。
|
- **数据排序**:新数据在前,旧数据在后,按发布时间倒序排列。
|
||||||
|
|
||||||
## 环境要求
|
## 环境要求
|
||||||
@ -39,37 +40,49 @@
|
|||||||
|
|
||||||
## 使用方法
|
## 使用方法
|
||||||
|
|
||||||
### 1. 抓取今天发布的数据 (默认)
|
### 1. 抓取最近3个月的数据 (默认)
|
||||||
直接运行脚本,程序将自动抓取发布日期为今天的“中标结果”公告。
|
直接运行脚本,程序将自动抓取最近3个月的中标结果公告。
|
||||||
```bash
|
```bash
|
||||||
python ygp_crawler.py
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. 抓取指定日期范围
|
### 2. 抓取指定日期范围
|
||||||
使用 `--start-date` 和 `--end-date` 参数(格式:`YYYY-MM-DD`)。
|
使用 `--start-date` 和 `--end-date` 参数(格式:`YYYY-MM-DD`)。
|
||||||
```bash
|
```bash
|
||||||
python ygp_crawler.py --start-date 2026-02-01 --end-date 2026-02-04
|
python main.py --start-date 2026-02-01 --end-date 2026-02-04
|
||||||
|
```
|
||||||
|
|
||||||
|
**示例:抓取最近7天的数据**
|
||||||
|
```bash
|
||||||
|
python main.py --start-date 2026-01-29 --end-date 2026-02-05
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. 自定义输出文件路径
|
### 3. 自定义输出文件路径
|
||||||
使用 `-o` 或 `--output` 参数指定输出 CSV 文件的路径(默认为 `results.csv`)。
|
使用 `-o` 或 `--output` 参数指定输出 CSV 文件的路径(默认为 `results.csv`)。
|
||||||
```bash
|
```bash
|
||||||
python ygp_crawler.py --start-date 2026-02-01 --end-date 2026-02-04 -o my_data.csv
|
python main.py --start-date 2026-02-01 --end-date 2026-02-04 -o my_data.csv
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. 增量爬取
|
### 4. 增量爬取(默认开启)
|
||||||
使用 `-i` 或 `--incremental` 参数启用增量爬取模式。脚本会自动读取已有 CSV 文件,计算时间范围,只抓取新数据。
|
脚本默认启用增量爬取模式。会自动读取已有 CSV 文件,计算时间范围,只抓取新数据。
|
||||||
|
|
||||||
**自动计算日期范围(推荐)**:
|
**自动计算日期范围(推荐)**:
|
||||||
```bash
|
```bash
|
||||||
# 自动从已有数据的最新日期+1开始,爬取到今天
|
# 自动从已有数据的最新日期+1开始,爬取到今天
|
||||||
python ygp_crawler.py -i
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
**手动指定日期范围**:
|
**手动指定日期范围**:
|
||||||
```bash
|
```bash
|
||||||
# 在增量模式下手动指定日期范围
|
# 手动指定日期范围(仍会去重)
|
||||||
python ygp_crawler.py -i --start-date 2026-02-01 --end-date 2026-02-04
|
python main.py --start-date 2026-02-01 --end-date 2026-02-04
|
||||||
|
```
|
||||||
|
|
||||||
|
**全量查询**:
|
||||||
|
```bash
|
||||||
|
# 删除 CSV 文件后运行,即可进行全量查询
|
||||||
|
rm results.csv
|
||||||
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
**增量爬取特性**:
|
**增量爬取特性**:
|
||||||
|
|||||||
1
assets/Banner-1ed718fb.js
Normal file
1
assets/Banner-1ed718fb.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{l as r,o as a,b as o,e as n,t as l,j as i,$ as c,E as u}from"./index-f1c6abff.js";const d={class:"wrapper"},m={class:"title"},_={key:0,style:{"font-size":"24px"},class:"subtitle"},p={__name:"Banner",props:{title:String,notRouteName:{type:Boolean,default:!1},subtitle:String},setup(e){return(t,b)=>{var s;return a(),o("div",{class:u(["home-banner",`home-banner-${(s=t.$route.meta)==null?void 0:s.code}`])},[n("div",d,[n("h1",m,l(e.title??(e.notRouteName?"":t.$route.name)),1),e.subtitle?(a(),o("p",_,l(e.subtitle),1)):i("",!0),c(t.$slots,"default",{},void 0,!0)])],2)}}},h=r(p,[["__scopeId","data-v-5b58dea9"]]);export{h as _};
|
||||||
1
assets/JyggDateFilter-b9da602b.js
Normal file
1
assets/JyggDateFilter-b9da602b.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{l as x,r as _,o as y,b as f,e as t,F as C,D as M,E as g,t as B,p,d as i,M as I,O as S,n as V,j as T,V as l,Q as h,Y as $,cA as A,Z as F,q as N,v as E}from"./index-f1c6abff.js";const j=o=>(N("data-v-a44551b8"),o=o(),E(),o),J={class:"list mt-1"},U=["onClick"],q={class:"rang-date"},z={class:"date"},L={class:"flex items-center mt-5"},O=j(()=>t("p",{class:"mb-0 opacity-50 text-black"},"请注意,时间跨度最多支持1年",-1)),Q={key:0,class:"mb-0 opacity-70"},R={__name:"JyggDateFilter",props:{modelValue:{type:Array,default:()=>[]},defaultType:{type:String,default:"month"},name:{type:String,default:"最近1个月"},showTips:{type:Boolean,default:!0}},emits:["update:modelValue","update:name"],setup(o,{expose:w,emit:D}){const b=o,n=_(""),d=_(""),r=_(!1),k=[{label:"最近1年",type:"all"},{label:"最近7天",type:"week",days:-7},{label:"最近15天",type:"two_week",days:-15},{label:"最近1个月",type:"month",days:-31},{label:"最近3个月",type:"three_month",days:-90}],v=_(b.defaultType);function m(u){const{type:e}=u;v.value=e;let c,s;switch(e){case"all":c="",s="";break;case"custom":if(!n.value||!d.value)return;if(c=l(n.value).format("YYYYMMDD000000"),s=l(d.value).format("YYYYMMDD235959"),l(d.value).isBefore(l(n.value),"day")){h.error({msg:"结束日期不能小于开始日期"});return}if(l(d.value).diff(l(n.value),"year",!0)>1){h.error({msg:"最多只能选择跨度为1年"});return}break;default:c=l().add(u.days,"day").format("YYYYMMDD000000"),s=l().format("YYYYMMDD235959");break}r.value=!1,D("update:name",u.label),D("update:modelValue",[c,s])}return m(k.find(u=>u.type===b.defaultType)),w({handleItemClick:m}),(u,e)=>{const c=$,s=A,Y=F;return y(),f("div",null,[t("ul",J,[(y(),f(C,null,M(k,a=>t("li",{onClick:Z=>m(a),class:g(["item",{active:v.value===a.type}]),key:a.type},B(a.label),11,U)),64)),t("li",{class:g(["item",{active:v.value==="custom"}])},[t("span",{onClick:e[0]||(e[0]=a=>r.value=!r.value)},[p("自定义 "),i(c,{class:"ml-1",icon:"down"})]),I(t("div",q,[t("div",z,[t("div",null,[p(" 开始日期 "),i(s,{modelValue:n.value,"onUpdate:modelValue":e[1]||(e[1]=a=>n.value=a),class:"ml-1"},null,8,["modelValue"])]),t("div",null,[p(" 结束日期 "),i(s,{modelValue:d.value,"onUpdate:modelValue":e[2]||(e[2]=a=>d.value=a),class:"ml-1"},null,8,["modelValue"])])]),t("div",L,[O,i(Y,{class:"ml-auto",type:"primary",color:"#0B68DA",onClick:e[3]||(e[3]=a=>m({type:"custom",label:"自定义"}))},{default:V(()=>[p("确认")]),_:1}),i(Y,{color:"#0B68DA",plain:"",onClick:e[4]||(e[4]=a=>r.value=!1),class:"ml-5"},{default:V(()=>[p("取消")]),_:1})])],512),[[S,r.value]])],2)]),o.showTips?(y(),f("p",Q,"温馨提醒:平台默认展示或搜索最近1年交易公开数据,选择【自定义】时间可查询更多公告信息。")):T("",!0)])}}},H=x(R,[["__scopeId","data-v-a44551b8"]]);export{H as _};
|
||||||
1
assets/JyggList-964ba23f.js
Normal file
1
assets/JyggList-964ba23f.js
Normal file
File diff suppressed because one or more lines are too long
1
assets/KeyCode-222bce94.js
Normal file
1
assets/KeyCode-222bce94.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{P as T}from"./default-1fc3bc07.js";import{a7 as S,c as N,ah as f,aa as _,co as M,z as P,am as R,an as U,a6 as C,d as I,b1 as A}from"./index-f1c6abff.js";function L(n,t){return n?n.contains(t):!1}var O=Symbol("TriggerContextKey"),F=function(t){return t?f(O,{setPortal:function(){},popPortal:!1}):{setPortal:function(){},popPortal:!1}},l=Symbol("PortalContextKey"),d=function(t){var r=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{inTriggerContext:!0};S(l,{inTriggerContext:r.inTriggerContext,shouldRender:N(function(){var s=t||{},i=s.sPopupVisible,a=s.popupRef,c=s.forceRender,o=s.autoDestroy,u=!1;return(i||a||c)&&(u=!0),!i&&o&&(u=!1),u})})},v=function(){d({},{inTriggerContext:!1});var t=f(l,{shouldRender:N(function(){return!1}),inTriggerContext:!1});return{shouldRender:N(function(){return t.shouldRender.value||t.inTriggerContext===!1})}};const p=_({compatConfig:{MODE:3},name:"Portal",inheritAttrs:!1,props:{getContainer:T.func.isRequired,didUpdate:Function},setup:function(t,r){var s=r.slots,i=!0,a,c=v(),o=c.shouldRender;M(function(){i=!1,o.value&&(a=t.getContainer())});var u=P(o,function(){o.value&&!a&&(a=t.getContainer()),a&&u()});return R(function(){U(function(){if(o.value){var E;(E=t.didUpdate)===null||E===void 0||E.call(t,t)}})}),C(function(){a&&a.parentNode&&a.parentNode.removeChild(a)}),function(){if(!o.value)return null;if(i){var E;return(E=s.default)===null||E===void 0?void 0:E.call(s)}return a?I(A,{to:a},s):null}}});var e={MAC_ENTER:3,BACKSPACE:8,TAB:9,NUM_CENTER:12,ENTER:13,SHIFT:16,CTRL:17,ALT:18,PAUSE:19,CAPS_LOCK:20,ESC:27,SPACE:32,PAGE_UP:33,PAGE_DOWN:34,END:35,HOME:36,LEFT:37,UP:38,RIGHT:39,DOWN:40,PRINT_SCREEN:44,INSERT:45,DELETE:46,ZERO:48,ONE:49,TWO:50,THREE:51,FOUR:52,FIVE:53,SIX:54,SEVEN:55,EIGHT:56,NINE:57,QUESTION_MARK:63,A:65,B:66,C:67,D:68,E:69,F:70,G:71,H:72,I:73,J:74,K:75,L:76,M:77,N:78,O:79,P:80,Q:81,R:82,S:83,T:84,U:85,V:86,W:87,X:88,Y:89,Z:90,META:91,WIN_KEY_RIGHT:92,CONTEXT_MENU:93,NUM_ZERO:96,NUM_ONE:97,NUM_TWO:98,NUM_THREE:99,NUM_FOUR:100,NUM_FIVE:101,NUM_SIX:102,NUM_SEVEN:103,NUM_EIGHT:104,NUM_NINE:105,NUM_MULTIPLY:106,NUM_PLUS:107,NUM_MINUS:109,NUM_PERIOD:110,NUM_DIVISION:111,F1:112,F2:113,F3:114,F4:115,F5:116,F6:117,F7:118,F8:119,F9:120,F10:121,F11:122,F12:123,NUMLOCK:144,SEMICOLON:186,DASH:189,EQUALS:187,COMMA:188,PERIOD:190,SLASH:191,APOSTROPHE:192,SINGLE_QUOTE:222,OPEN_SQUARE_BRACKET:219,BACKSLASH:220,CLOSE_SQUARE_BRACKET:221,WIN_KEY:224,MAC_FF_META:224,WIN_IME:229,isTextModifyingKeyEvent:function(t){var r=t.keyCode;if(t.altKey&&!t.ctrlKey||t.metaKey||r>=e.F1&&r<=e.F12)return!1;switch(r){case e.ALT:case e.CAPS_LOCK:case e.CONTEXT_MENU:case e.CTRL:case e.DOWN:case e.END:case e.ESC:case e.HOME:case e.INSERT:case e.LEFT:case e.MAC_FF_META:case e.META:case e.NUMLOCK:case e.NUM_CENTER:case e.PAGE_DOWN:case e.PAGE_UP:case e.PAUSE:case e.PRINT_SCREEN:case e.RIGHT:case e.SHIFT:case e.UP:case e.WIN_KEY:case e.WIN_KEY_RIGHT:return!1;default:return!0}},isCharacterKey:function(t){if(t>=e.ZERO&&t<=e.NINE||t>=e.NUM_ZERO&&t<=e.NUM_MULTIPLY||t>=e.A&&t<=e.Z||window.navigator.userAgent.indexOf("WebKit")!==-1&&t===0)return!0;switch(t){case e.SPACE:case e.QUESTION_MARK:case e.NUM_PLUS:case e.NUM_MINUS:case e.NUM_PERIOD:case e.NUM_DIVISION:case e.SEMICOLON:case e.DASH:case e.EQUALS:case e.COMMA:case e.PERIOD:case e.SLASH:case e.APOSTROPHE:case e.SINGLE_QUOTE:case e.OPEN_SQUARE_BRACKET:case e.BACKSLASH:case e.CLOSE_SQUARE_BRACKET:return!0;default:return!1}}};const H=e;export{H as K,p as P,d as a,L as c,F as u};
|
||||||
1
assets/addEventListener-88586f3d.js
Normal file
1
assets/addEventListener-88586f3d.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{d as l}from"./default-1fc3bc07.js";var m=function(e,t){var i=l({},e);return Object.keys(t).forEach(function(r){var a=i[r];if(a)a.type||a.default?a.default=t[r]:a.def?a.def(t[r]):i[r]={type:a,default:t[r]};else throw new Error("not have ".concat(r," prop"))}),i};const h=m;var u=function(e){return setTimeout(e,16)},c=function(e){return clearTimeout(e)};typeof window<"u"&&"requestAnimationFrame"in window&&(u=function(e){return window.requestAnimationFrame(e)},c=function(e){return window.cancelAnimationFrame(e)});var o=0,s=new Map;function v(n){s.delete(n)}function p(n){var e=arguments.length>1&&arguments[1]!==void 0?arguments[1]:1;o+=1;var t=o;function i(r){if(r===0)v(t),n();else{var a=u(function(){i(r-1)});s.set(t,a)}}return i(e),t}p.cancel=function(n){var e=s.get(n);return v(e),c(e)};var d=!1;try{var f=Object.defineProperty({},"passive",{get:function(){d=!0}});window.addEventListener("testPassive",null,f),window.removeEventListener("testPassive",null,f)}catch{}const w=d;function L(n,e,t,i){if(n&&n.addEventListener){var r=i;r===void 0&&w&&(e==="touchstart"||e==="touchmove"||e==="wheel")&&(r={passive:!1}),n.addEventListener(e,t,r)}return{remove:function(){n&&n.removeEventListener&&n.removeEventListener(e,t)}}}export{L as a,h as i,w as s,p as w};
|
||||||
1
assets/api-f26de41c.js
Normal file
1
assets/api-f26de41c.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{H as i}from"./index-f1c6abff.js";const t=i(),r=i("yhzx"),s=new Map,o=new i("qrcode"),d={getNoticeUrl:(e,a="v2")=>t.get(`trading-notice/${a}/url`,{params:e}),getTradingDetailV2:(e,a="v2")=>t.get(`trading-notice/${a}/detail`,{params:e}),getFileList:(e,a="v2")=>t.get(`trading-notice/${a}/file/${e}`),getYgpCodeRequestParam:e=>t.get("trading-notice/new/qrcode/getRequestParam",{params:e}),getYgpUrl:e=>t.get("trading-notice/qrCode",{params:e}),getNodeList:e=>t.get("trading-notice/new/nodeList",{params:e}),getSingleNode:e=>t.get("trading-notice/new/singleNode",{params:e}),getNewDetailData:e=>t.get("trading-notice/new/detail",{params:e}),getAgendaData:e=>t.get("trading-notice/new/agenda/fetch",{params:e}),getRefDataset:e=>t.get("trading-notice/new/fetch/refDataset",{params:e}),getCorrelationDataset:e=>t.get("trading-notice/new/query/correlationDataset",{params:e}),getSubscribed:e=>r.get(`apis/user-subscribe/getIdByNoticeId/${e}`),subscribe:e=>r.post("apis/user-subscribe",e),unsubscribe:e=>r.post(`apis/user-subscribe/remove/${e}`),getDict:async(e,a="v2")=>{const n=s.get(e);if(n)return n;{const g=await t.get(`trading-notice/${a}/dict`,{params:{tradingProcess:e}});return s.set(e,g),g}},generateQrcode:e=>o.post("generate-qrcode",e,{noNeedTip:!0}),refreshQrcode:e=>o.post("refresh-qrcode",e,{noNeedTip:!0})},p=i("search"),c=new i("mhyy"),l={getTradeTypes:e=>c.get(`/cms/trade/public/${e}`),getAreas:e=>c.get(`/cms/trade/public/areas/${e}`),getJyggList:e=>p.post("v2/items",e),...d};export{l as y};
|
||||||
1
assets/arrow-down-bold-62f05036.js
Normal file
1
assets/arrow-down-bold-62f05036.js
Normal file
File diff suppressed because one or more lines are too long
1
assets/dateParse-086bcd39.js
Normal file
1
assets/dateParse-086bcd39.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
function u(n){if(n&&(n.length===14||n.length===8)){const s=n.substring(0,4),o=n.substring(4,6),i=n.substring(6,8);return[s,o,i].join("-")}return"-"}export{u as d};
|
||||||
71
assets/default-1fc3bc07.js
Normal file
71
assets/default-1fc3bc07.js
Normal file
File diff suppressed because one or more lines are too long
1
assets/index-a98c37f5.js
Normal file
1
assets/index-a98c37f5.js
Normal file
File diff suppressed because one or more lines are too long
1
assets/index-da9d468e.js
Normal file
1
assets/index-da9d468e.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{aa as c,ab as _,bf as h,o as n,b as r,e as t,f as e,ag as u,a2 as g,b7 as k,c as f,aS as $,$ as p,d as v,E as i,a0 as C,t as w,j as B,aj as N}from"./index-f1c6abff.js";const V={viewBox:"0 0 79 86",version:"1.1",xmlns:"http://www.w3.org/2000/svg","xmlns:xlink":"http://www.w3.org/1999/xlink"},x=["id"],E=["stop-color"],b=["stop-color"],R=["id"],S=["stop-color"],G=["stop-color"],I=["id"],z={id:"Illustrations",stroke:"none","stroke-width":"1",fill:"none","fill-rule":"evenodd"},j={id:"B-type",transform:"translate(-1268.000000, -535.000000)"},D={id:"Group-2",transform:"translate(1268.000000, 535.000000)"},M=["fill"],P=["fill"],L={id:"Group-Copy",transform:"translate(34.500000, 31.500000) scale(-1, 1) rotate(-25.000000) translate(-34.500000, -31.500000) translate(7.000000, 10.000000)"},O=["fill"],T=["fill"],U=["fill"],Z=["fill"],q=["fill"],A={id:"Rectangle-Copy-17",transform:"translate(53.000000, 45.000000)"},F=["fill","xlink:href"],H=["fill","mask"],J=["fill"],K=c({name:"ImgEmpty"}),Q=c({...K,setup(d){const s=_("empty"),l=h();return(a,m)=>(n(),r("svg",V,[t("defs",null,[t("linearGradient",{id:`linearGradient-1-${e(l)}`,x1:"38.8503086%",y1:"0%",x2:"61.1496914%",y2:"100%"},[t("stop",{"stop-color":`var(${e(s).cssVarBlockName("fill-color-1")})`,offset:"0%"},null,8,E),t("stop",{"stop-color":`var(${e(s).cssVarBlockName("fill-color-4")})`,offset:"100%"},null,8,b)],8,x),t("linearGradient",{id:`linearGradient-2-${e(l)}`,x1:"0%",y1:"9.5%",x2:"100%",y2:"90.5%"},[t("stop",{"stop-color":`var(${e(s).cssVarBlockName("fill-color-1")})`,offset:"0%"},null,8,S),t("stop",{"stop-color":`var(${e(s).cssVarBlockName("fill-color-6")})`,offset:"100%"},null,8,G)],8,R),t("rect",{id:`path-3-${e(l)}`,x:"0",y:"0",width:"17",height:"36"},null,8,I)]),t("g",z,[t("g",j,[t("g",D,[t("path",{id:"Oval-Copy-2",d:"M39.5,86 C61.3152476,86 79,83.9106622 79,81.3333333 C79,78.7560045 57.3152476,78 35.5,78 C13.6847524,78 0,78.7560045 0,81.3333333 C0,83.9106622 17.6847524,86 39.5,86 Z",fill:`var(${e(s).cssVarBlockName("fill-color-3")})`},null,8,M),t("polygon",{id:"Rectangle-Copy-14",fill:`var(${e(s).cssVarBlockName("fill-color-7")})`,transform:"translate(27.500000, 51.500000) scale(1, -1) translate(-27.500000, -51.500000) ",points:"13 58 53 58 42 45 2 45"},null,8,P),t("g",L,[t("polygon",{id:"Rectangle-Copy-10",fill:`var(${e(s).cssVarBlockName("fill-color-7")})`,transform:"translate(11.500000, 5.000000) scale(1, -1) translate(-11.500000, -5.000000) ",points:"2.84078316e-14 3 18 3 23 7 5 7"},null,8,O),t("polygon",{id:"Rectangle-Copy-11",fill:`var(${e(s).cssVarBlockName("fill-color-5")})`,points:"-3.69149156e-15 7 38 7 38 43 -3.69149156e-15 43"},null,8,T),t("rect",{id:"Rectangle-Copy-12",fill:`url(#linearGradient-1-${e(l)})`,transform:"translate(46.500000, 25.000000) scale(-1, 1) translate(-46.500000, -25.000000) ",x:"38",y:"7",width:"17",height:"36"},null,8,U),t("polygon",{id:"Rectangle-Copy-13",fill:`var(${e(s).cssVarBlockName("fill-color-2")})`,transform:"translate(39.500000, 3.500000) scale(-1, 1) translate(-39.500000, -3.500000) ",points:"24 7 41 7 55 -3.63806207e-12 38 -3.63806207e-12"},null,8,Z)]),t("rect",{id:"Rectangle-Copy-15",fill:`url(#linearGradient-2-${e(l)})`,x:"13",y:"45",width:"40",height:"36"},null,8,q),t("g",A,[t("use",{id:"Mask",fill:`var(${e(s).cssVarBlockName("fill-color-8")})`,transform:"translate(8.500000, 18.000000) scale(-1, 1) translate(-8.500000, -18.000000) ","xlink:href":`#path-3-${e(l)}`},null,8,F),t("polygon",{id:"Rectangle-Copy",fill:`var(${e(s).cssVarBlockName("fill-color-9")})`,mask:`url(#mask-4-${e(l)})`,transform:"translate(12.000000, 9.000000) scale(-1, 1) translate(-12.000000, -9.000000) ",points:"7 0 24 0 20 18 7 16.5"},null,8,H)]),t("polygon",{id:"Rectangle-Copy-18",fill:`var(${e(s).cssVarBlockName("fill-color-2")})`,transform:"translate(66.000000, 51.500000) scale(-1, 1) translate(-66.000000, -51.500000) ",points:"62 45 79 45 70 58 53 58"},null,8,J)])])])]))}});var W=u(Q,[["__file","/home/runner/work/element-plus/element-plus/packages/components/empty/src/img-empty.vue"]]);const X=g({image:{type:String,default:""},imageSize:Number,description:{type:String,default:""}}),Y=["src"],t0={key:1},e0=c({name:"ElEmpty"}),s0=c({...e0,props:X,setup(d){const s=d,{t:l}=k(),a=_("empty"),m=f(()=>s.description||l("el.table.emptyText")),y=f(()=>({width:$(s.imageSize)}));return(o,o0)=>(n(),r("div",{class:i(e(a).b())},[t("div",{class:i(e(a).e("image")),style:C(e(y))},[o.image?(n(),r("img",{key:0,src:o.image,ondragstart:"return false"},null,8,Y)):p(o.$slots,"image",{key:1},()=>[v(W)])],6),t("div",{class:i(e(a).e("description"))},[o.$slots.description?p(o.$slots,"description",{key:0}):(n(),r("p",t0,w(e(m)),1))],2),o.$slots.default?(n(),r("div",{key:0,class:i(e(a).e("bottom"))},[p(o.$slots,"default")],2)):B("v-if",!0)],2))}});var l0=u(s0,[["__file","/home/runner/work/element-plus/element-plus/packages/components/empty/src/empty.vue"]]);const n0=N(l0);export{n0 as E};
|
||||||
41
assets/index-f1c6abff.js
Normal file
41
assets/index-f1c6abff.js
Normal file
File diff suppressed because one or more lines are too long
4
assets/index-f29230c3.js
Normal file
4
assets/index-f29230c3.js
Normal file
File diff suppressed because one or more lines are too long
1
assets/index-ff57e0d0.js
Normal file
1
assets/index-ff57e0d0.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{A as d,c as f,r as l,z as v,K as h,L as N,o as m,b as z,d as n,n as S,F as b,_ as x,M as w,O as B,N as C,P as D}from"./index-f1c6abff.js";import{_ as k}from"./Banner-1ed718fb.js";import{J as q,a as F}from"./JyggList-964ba23f.js";import{y as J}from"./api-f26de41c.js";import"./default-1fc3bc07.js";import"./index-a98c37f5.js";import"./KeyCode-222bce94.js";import"./index-f29230c3.js";import"./addEventListener-88586f3d.js";import"./objectDestructuringEmpty-a5c19c06.js";/* empty css */import"./JyggDateFilter-b9da602b.js";/* empty css *//* empty css */import"./arrow-down-bold-62f05036.js";/* empty css */import"./dateParse-086bcd39.js";import"./useJump-b2a96f17.js";import"./index-da9d468e.js";const Z={__name:"index",setup(L){const i=d();i.getBannerConfig("交易公告");const r=f(()=>i.banner),o=l({type:"trading-type",openConvert:!1});v(o,a=>{t.pageNo=1,p(a)});const t=h({total:0,pageNo:1,pageSize:10});function g(a){t.pageNo=a,p()}const s=l([]),p=N(async(a=o.value)=>{const e=await J.getJyggList({...a,pageNo:t.pageNo,pageSize:t.pageSize});s.value=e.pageData,t.total=+e.total},500);return(a,e)=>{const c=x,u=k,_=D;return m(),z(b,null,[n(c),n(u,{title:r.value.name,subtitle:r.value.description},null,8,["title","subtitle"]),n(q,{query:o.value,"onUpdate:query":e[0]||(e[0]=y=>o.value=y)},{default:S(()=>[n(F,{"list-data":s.value},null,8,["list-data"]),w((m(),C(_,{key:Math.random(),class:"mt-6",total:t.total,index:t.pageNo,onChange:g,"page-size":t.pageSize},null,8,["total","index","page-size"])),[[B,t.total>0]])]),_:1},8,["query"])],64)}}};export{Z as default};
|
||||||
18
assets/objectDestructuringEmpty-a5c19c06.js
Normal file
18
assets/objectDestructuringEmpty-a5c19c06.js
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
import{d as p,n as l}from"./default-1fc3bc07.js";var m=`accept acceptcharset accesskey action allowfullscreen allowtransparency
|
||||||
|
alt async autocomplete autofocus autoplay capture cellpadding cellspacing challenge
|
||||||
|
charset checked classid classname colspan cols content contenteditable contextmenu
|
||||||
|
controls coords crossorigin data datetime default defer dir disabled download draggable
|
||||||
|
enctype form formaction formenctype formmethod formnovalidate formtarget frameborder
|
||||||
|
headers height hidden high href hreflang htmlfor for httpequiv icon id inputmode integrity
|
||||||
|
is keyparams keytype kind label lang list loop low manifest marginheight marginwidth max maxlength media
|
||||||
|
mediagroup method min minlength multiple muted name novalidate nonce open
|
||||||
|
optimum pattern placeholder poster preload radiogroup readonly rel required
|
||||||
|
reversed role rowspan rows sandbox scope scoped scrolling seamless selected
|
||||||
|
shape size sizes span spellcheck src srcdoc srclang srcset start step style
|
||||||
|
summary tabindex target title type usemap value width wmode wrap`,h=`onCopy onCut onPaste onCompositionend onCompositionstart onCompositionupdate onKeydown
|
||||||
|
onKeypress onKeyup onFocus onBlur onChange onInput onSubmit onClick onContextmenu onDoubleclick onDblclick
|
||||||
|
onDrag onDragend onDragenter onDragexit onDragleave onDragover onDragstart onDrop onMousedown
|
||||||
|
onMouseenter onMouseleave onMousemove onMouseout onMouseover onMouseup onSelect onTouchcancel
|
||||||
|
onTouchend onTouchmove onTouchstart onTouchstartPassive onTouchmovePassive onScroll onWheel onAbort onCanplay onCanplaythrough
|
||||||
|
onDurationchange onEmptied onEncrypted onEnded onError onLoadeddata onLoadedmetadata
|
||||||
|
onLoadstart onPause onPlay onPlaying onProgress onRatechange onSeeked onSeeking onStalled onSuspend onTimeupdate onVolumechange onWaiting onLoad onError`,s="".concat(m," ").concat(h).split(/[\s\n]+/),f="aria-",g="data-";function d(n,e){return n.indexOf(e)===0}function b(n){var e=arguments.length>1&&arguments[1]!==void 0?arguments[1]:!1,o;e===!1?o={aria:!0,data:!0,attr:!0}:e===!0?o={aria:!0}:o=p({},e);var t={};return Object.keys(n).forEach(function(a){(o.aria&&(a==="role"||d(a,f))||o.data&&d(a,g)||o.attr&&(s.includes(a)||s.includes(a.toLowerCase())))&&(t[a]=n[a])}),t}var E=function(){return l()&&window.document.documentElement},u=function(e){if(l()&&window.document.documentElement){var o=Array.isArray(e)?e:[e],t=window.document.documentElement;return o.some(function(a){return a in t.style})}return!1},v=function(e,o){if(!u(e))return!1;var t=document.createElement("div"),a=t.style[e];return t.style[e]=o,t.style[e]!==a};function C(n,e){return!Array.isArray(n)&&e!==void 0?v(n,e):u(n)}var i;function y(n){if(typeof document>"u")return 0;if(n||i===void 0){var e=document.createElement("div");e.style.width="100%",e.style.height="200px";var o=document.createElement("div"),t=o.style;t.position="absolute",t.top="0",t.left="0",t.pointerEvents="none",t.visibility="hidden",t.width="200px",t.height="150px",t.overflow="hidden",o.appendChild(e),document.body.appendChild(o);var a=e.offsetWidth;o.style.overflow="scroll";var r=e.offsetWidth;a===r&&(r=o.clientWidth),document.body.removeChild(o),i=a-r}return i}function c(n){var e=n.match(/^(.*)px$/),o=Number(e==null?void 0:e[1]);return Number.isNaN(o)?y():o}function S(n){if(typeof document>"u"||!n||!(n instanceof Element))return{width:0,height:0};var e=getComputedStyle(n,"::-webkit-scrollbar"),o=e.width,t=e.height;return{width:c(o),height:c(t)}}function x(n){if(n==null)throw new TypeError("Cannot destructure "+n)}export{x as _,S as a,E as c,y as g,C as i,b as p};
|
||||||
1
assets/useJump-b2a96f17.js
Normal file
1
assets/useJump-b2a96f17.js
Normal file
@ -0,0 +1 @@
|
|||||||
|
import{y as u,u as y,cK as l,Q as f}from"./index-f1c6abff.js";const C=()=>{u();const i=y();async function a(e){const t=await l.getAgendaDetail(e.recordId);if(!t||!t.version){f.info("当前交易日程未发布相关交易公告信息");return}const n=t.pubServicePlat,o=e.tradingType;if(e.edition==="v0"){const{href:p}=i.resolve({name:"交易公告V0",params:{infoId:t.noticeId},query:{source:n,titleDetails:o}});window.open(p);return}if(!t.tradingType)return;const c={edition:t.version,tradingType:t.tradingType},s={noticeId:t.noticeId,projectCode:t.projectCode,bizCode:t.bizCode,siteCode:t.siteCode,publishDate:t.publishDate||t.date,source:n,titleDetails:o,classify:t.projectType??e.noticeType},{href:r}=i.resolve({name:"交易公告详情-new",params:c,query:s});window.open(r)}function d(e){const t=e.pubServicePlat,n=e.noticeSecondTypeDesc;if(e.edition==="v0"){const{href:r}=i.resolve({name:"交易公告V0",params:{infoId:e.noticeId},query:{source:t,titleDetails:n}});window.open(r);return}const o={edition:e.edition,tradingType:e.noticeSecondType},c={noticeId:e.noticeId,projectCode:e.projectCode,bizCode:e.tradingProcess,siteCode:e.regionCode,publishDate:e.publishDate,source:t,titleDetails:n,classify:e.projectType},{href:s}=i.resolve({name:"交易公告详情-new",params:o,query:c});return window.open(s),s}return{jyrcJump:a,jygkJump:d}};export{C as u};
|
||||||
@ -1,20 +1,20 @@
|
|||||||
curl 'https://ygp.gdzwfw.gov.cn/ggzy-portal/search/v2/items' \
|
curl 'https://ygp.gdzwfw.gov.cn/ggzy-portal/search/v2/items' \
|
||||||
-H 'Accept: application/json, text/plain, */*' \
|
|
||||||
-H 'Accept-Language: zh-CN,zh;q=0.9' \
|
|
||||||
-H 'Connection: keep-alive' \
|
-H 'Connection: keep-alive' \
|
||||||
-H 'Content-Type: application/json' \
|
-b '_horizon_uid=ec39e18f-7968-4277-afeb-0bea42d5de45; _horizon_sid=0f61bd04-5719-467b-8f3a-4c2fb2208da4' \
|
||||||
-b '_horizon_sid=85f7f27a-a1ca-4d3b-87b1-9c733b1d846c; _horizon_uid=ec39e18f-7968-4277-afeb-0bea42d5de45' \
|
|
||||||
-H 'Origin: https://ygp.gdzwfw.gov.cn' \
|
-H 'Origin: https://ygp.gdzwfw.gov.cn' \
|
||||||
-H 'Referer: https://ygp.gdzwfw.gov.cn/' \
|
-H 'Referer: https://ygp.gdzwfw.gov.cn/' \
|
||||||
-H 'Sec-Fetch-Dest: empty' \
|
-H 'Sec-Fetch-Dest: empty' \
|
||||||
-H 'Sec-Fetch-Mode: cors' \
|
-H 'Sec-Fetch-Mode: cors' \
|
||||||
-H 'Sec-Fetch-Site: same-origin' \
|
-H 'Sec-Fetch-Site: same-origin' \
|
||||||
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36' \
|
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36' \
|
||||||
-H 'X-Dgi-Req-App: ggzy-portal' \
|
-H 'accept: application/json, text/plain, */*' \
|
||||||
-H 'X-Dgi-Req-Nonce: xHui3fC2RmcIuHyQ' \
|
-H 'accept-language: zh-CN,zh;q=0.9' \
|
||||||
-H 'X-Dgi-Req-Signature: ccd2c2a549d3ea1932cb4e843715be6c414af04b1f025ca62686913efb5b3d39' \
|
-H 'content-type: application/json' \
|
||||||
-H 'X-Dgi-Req-Timestamp: 1770198773779' \
|
|
||||||
-H 'sec-ch-ua: "Not(A:Brand";v="8", "Chromium";v="144", "Google Chrome";v="144"' \
|
-H 'sec-ch-ua: "Not(A:Brand";v="8", "Chromium";v="144", "Google Chrome";v="144"' \
|
||||||
-H 'sec-ch-ua-mobile: ?0' \
|
-H 'sec-ch-ua-mobile: ?0' \
|
||||||
-H 'sec-ch-ua-platform: "macOS"' \
|
-H 'sec-ch-ua-platform: "macOS"' \
|
||||||
--data-raw '{"type":"trading-type","openConvert":false,"keyword":"","siteCode":"44","secondType":"A","tradingProcess":"","thirdType":"[]","projectType":"","publishStartTime":"","publishEndTime":"","pageNo":2,"pageSize":10}'
|
-H 'x-dgi-req-app: ggzy-portal' \
|
||||||
|
-H 'x-dgi-req-nonce: fCuDiZcgnqabdAfT' \
|
||||||
|
-H 'x-dgi-req-signature: 73c3b89179feda9eeb62397c326c440be997a8aa9eab5387820b8642f2c72b79' \
|
||||||
|
-H 'x-dgi-req-timestamp: 1770220343291' \
|
||||||
|
--data-raw '{"type":"trading-type","openConvert":false,"keyword":"","siteCode":"44","secondType":"A","tradingProcess":"513,2C52,3C52","thirdType":"[]","projectType":"","publishStartTime":"20251106000000","publishEndTime":"20260204235959","pageNo":1,"pageSize":50}'
|
||||||
File diff suppressed because it is too large
Load Diff
@ -22,7 +22,6 @@ def parse_args():
|
|||||||
parser.add_argument("--start-date", help="开始日期 (YYYY-MM-DD)")
|
parser.add_argument("--start-date", help="开始日期 (YYYY-MM-DD)")
|
||||||
parser.add_argument("--end-date", help="结束日期 (YYYY-MM-DD)")
|
parser.add_argument("--end-date", help="结束日期 (YYYY-MM-DD)")
|
||||||
parser.add_argument("--output", "-o", default="results.csv", help="输出CSV文件路径 (默认: results.csv)")
|
parser.add_argument("--output", "-o", default="results.csv", help="输出CSV文件路径 (默认: results.csv)")
|
||||||
parser.add_argument("--incremental", "-i", action="store_true", help="启用增量爬取模式")
|
|
||||||
return parser.parse_args()
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
@ -72,7 +71,7 @@ def read_existing_csv(csv_path):
|
|||||||
max_date = None
|
max_date = None
|
||||||
|
|
||||||
try:
|
try:
|
||||||
with open(csv_path, "r", encoding="utf-8", newline="") as f:
|
with open(csv_path, "r", encoding="utf-8-sig", newline="") as f:
|
||||||
reader = csv.reader(f)
|
reader = csv.reader(f)
|
||||||
header = next(reader, None) # Skip header
|
header = next(reader, None) # Skip header
|
||||||
if not header:
|
if not header:
|
||||||
@ -125,71 +124,50 @@ def construct_detail_url(item):
|
|||||||
return f"{base_url}?{query}"
|
return f"{base_url}?{query}"
|
||||||
|
|
||||||
|
|
||||||
def build_search_payload(page_num=1, page_size=10):
|
def build_search_payload(page_num=1, publish_start_time="", publish_end_time=""):
|
||||||
"""Build the search API payload."""
|
"""Build the search API payload.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
page_num: Page number
|
||||||
|
publish_start_time: Start time in format YYYYMMDDHHMMSS
|
||||||
|
publish_end_time: End time in format YYYYMMDDHHMMSS
|
||||||
|
"""
|
||||||
return {
|
return {
|
||||||
"pageNo": page_num,
|
"pageNo": page_num,
|
||||||
"pageSize": page_size,
|
"pageSize": 50, # Fixed page size 50
|
||||||
"keyword": "",
|
"keyword": "",
|
||||||
"siteCode": "44",
|
"siteCode": "44",
|
||||||
"secondType": "",
|
"secondType": "",
|
||||||
"tradingProcess": "",
|
"tradingProcess": "513,2C52,3C52", # Fixed to search for 中标结果
|
||||||
"thirdType": "[]",
|
"thirdType": "[]",
|
||||||
"projectType": "",
|
"projectType": "",
|
||||||
"publishStartTime": "",
|
"publishStartTime": publish_start_time, # Format: YYYYMMDDHHMMSS
|
||||||
"publishEndTime": "",
|
"publishEndTime": publish_end_time, # Format: YYYYMMDDHHMMSS
|
||||||
"type": "trading-type"
|
"type": "trading-type"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def process_items(items, start_date, end_date):
|
def process_items(items):
|
||||||
"""Process a batch of items and filter by date and keyword."""
|
"""Process a batch of items - API already filters by tradingProcess and date range."""
|
||||||
page_results = []
|
page_results = []
|
||||||
stop_signal = False
|
|
||||||
min_date_on_page = None
|
|
||||||
|
|
||||||
for item in items:
|
for item in items:
|
||||||
title = item.get("noticeTitle", "")
|
title = item.get("noticeTitle", "")
|
||||||
pub_date_str = item.get("publishDate", "")
|
pub_date_str = item.get("publishDate", "")
|
||||||
item_date = parse_api_date(pub_date_str)
|
|
||||||
|
|
||||||
if item_date:
|
page_results.append({
|
||||||
if min_date_on_page is None or item_date < min_date_on_page:
|
"项目标题": title,
|
||||||
min_date_on_page = item_date
|
"发布时间": format_datetime(pub_date_str),
|
||||||
|
"详细链接": construct_detail_url(item)
|
||||||
|
})
|
||||||
|
|
||||||
if item_date > end_date:
|
return page_results
|
||||||
continue
|
|
||||||
if item_date < start_date:
|
|
||||||
continue
|
|
||||||
|
|
||||||
if "中标结果" in title:
|
|
||||||
page_results.append({
|
|
||||||
"项目标题": title,
|
|
||||||
"发布时间": format_datetime(pub_date_str),
|
|
||||||
"详细链接": construct_detail_url(item)
|
|
||||||
})
|
|
||||||
|
|
||||||
# Only stop if all items on this page are older than start_date
|
|
||||||
# and there are no matching results
|
|
||||||
if min_date_on_page and min_date_on_page < start_date and not page_results:
|
|
||||||
# Check if the newest item is also older than start_date
|
|
||||||
max_date_on_page = None
|
|
||||||
for item in items:
|
|
||||||
item_date = parse_api_date(item.get("publishDate", ""))
|
|
||||||
if item_date:
|
|
||||||
if max_date_on_page is None or item_date > max_date_on_page:
|
|
||||||
max_date_on_page = item_date
|
|
||||||
|
|
||||||
if max_date_on_page and max_date_on_page < start_date:
|
|
||||||
stop_signal = True
|
|
||||||
|
|
||||||
return page_results, stop_signal
|
|
||||||
|
|
||||||
|
|
||||||
async def fetch_page(session, page_num, page_size=10):
|
async def fetch_page(session, page_num, publish_start_time="", publish_end_time=""):
|
||||||
"""Fetch a single page of data from the API."""
|
"""Fetch a single page of data from the API."""
|
||||||
url = f"{API_BASE_URL}/search/v2/items"
|
url = f"{API_BASE_URL}/search/v2/items"
|
||||||
payload = build_search_payload(page_num, page_size)
|
payload = build_search_payload(page_num, publish_start_time, publish_end_time)
|
||||||
|
|
||||||
headers = {
|
headers = {
|
||||||
"Content-Type": "application/json",
|
"Content-Type": "application/json",
|
||||||
@ -227,19 +205,26 @@ def deduplicate_results(new_results, existing_data):
|
|||||||
return unique_results
|
return unique_results
|
||||||
|
|
||||||
|
|
||||||
|
def format_api_datetime(dt):
|
||||||
|
"""Format datetime to API format YYYYMMDDHHMMSS."""
|
||||||
|
return dt.strftime("%Y%m%d%H%M%S")
|
||||||
|
|
||||||
|
|
||||||
async def run():
|
async def run():
|
||||||
args = parse_args()
|
args = parse_args()
|
||||||
|
|
||||||
today = date.today()
|
today = date.today()
|
||||||
start_date = today
|
# Default: last 3 months
|
||||||
|
three_months_ago = today - timedelta(days=90)
|
||||||
|
start_date = three_months_ago
|
||||||
end_date = today
|
end_date = today
|
||||||
|
|
||||||
# Read existing CSV if in incremental mode
|
# Always use incremental mode - read existing CSV if exists
|
||||||
existing_data = []
|
existing_data = []
|
||||||
csv_min_date = None
|
csv_min_date = None
|
||||||
csv_max_date = None
|
csv_max_date = None
|
||||||
|
|
||||||
if args.incremental and os.path.exists(args.output):
|
if os.path.exists(args.output):
|
||||||
existing_data, csv_min_date, csv_max_date = read_existing_csv(args.output)
|
existing_data, csv_min_date, csv_max_date = read_existing_csv(args.output)
|
||||||
if csv_min_date and csv_max_date:
|
if csv_min_date and csv_max_date:
|
||||||
print(f"Existing data range: {csv_min_date} to {csv_max_date}", file=sys.stderr)
|
print(f"Existing data range: {csv_min_date} to {csv_max_date}", file=sys.stderr)
|
||||||
@ -252,8 +237,8 @@ async def run():
|
|||||||
except ValueError:
|
except ValueError:
|
||||||
print(f"Error: Invalid start date format: {args.start_date}", file=sys.stderr)
|
print(f"Error: Invalid start date format: {args.start_date}", file=sys.stderr)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
elif args.incremental and csv_max_date:
|
elif csv_max_date:
|
||||||
# In incremental mode without explicit start_date, fetch from max_date+1 to today
|
# Without explicit start_date, fetch from max_date+1 to today
|
||||||
start_date = csv_max_date + timedelta(days=1)
|
start_date = csv_max_date + timedelta(days=1)
|
||||||
|
|
||||||
if args.end_date:
|
if args.end_date:
|
||||||
@ -268,10 +253,15 @@ async def run():
|
|||||||
print(f"Existing records: {len(existing_data)}", file=sys.stderr)
|
print(f"Existing records: {len(existing_data)}", file=sys.stderr)
|
||||||
sys.exit(0)
|
sys.exit(0)
|
||||||
|
|
||||||
|
# Format time for API: YYYYMMDDHHMMSS
|
||||||
|
start_datetime = datetime.combine(start_date, datetime.min.time())
|
||||||
|
end_datetime = datetime.combine(end_date, datetime.max.time().replace(microsecond=0))
|
||||||
|
publish_start_time = format_api_datetime(start_datetime)
|
||||||
|
publish_end_time = format_api_datetime(end_datetime)
|
||||||
|
|
||||||
print(f"Crawling range: {start_date} to {end_date}", file=sys.stderr)
|
print(f"Crawling range: {start_date} to {end_date}", file=sys.stderr)
|
||||||
|
print(f"API time range: {publish_start_time} to {publish_end_time}", file=sys.stderr)
|
||||||
print(f"Output file: {args.output}", file=sys.stderr)
|
print(f"Output file: {args.output}", file=sys.stderr)
|
||||||
if args.incremental:
|
|
||||||
print(f"Incremental mode: ON", file=sys.stderr)
|
|
||||||
|
|
||||||
# Collect all new results first
|
# Collect all new results first
|
||||||
new_results = []
|
new_results = []
|
||||||
@ -284,7 +274,7 @@ async def run():
|
|||||||
|
|
||||||
await delay(500)
|
await delay(500)
|
||||||
|
|
||||||
resp = await fetch_page(session, page_num)
|
resp = await fetch_page(session, page_num, publish_start_time, publish_end_time)
|
||||||
|
|
||||||
if resp is None:
|
if resp is None:
|
||||||
print("Failed to fetch data. Stopping.", file=sys.stderr)
|
print("Failed to fetch data. Stopping.", file=sys.stderr)
|
||||||
@ -297,7 +287,7 @@ async def run():
|
|||||||
print("No more items.", file=sys.stderr)
|
print("No more items.", file=sys.stderr)
|
||||||
break
|
break
|
||||||
|
|
||||||
results, stop = process_items(items, start_date, end_date)
|
results = process_items(items)
|
||||||
new_results.extend(results)
|
new_results.extend(results)
|
||||||
|
|
||||||
# Print to console immediately
|
# Print to console immediately
|
||||||
@ -305,10 +295,6 @@ async def run():
|
|||||||
print(json.dumps(r, ensure_ascii=False))
|
print(json.dumps(r, ensure_ascii=False))
|
||||||
sys.stdout.flush()
|
sys.stdout.flush()
|
||||||
|
|
||||||
if stop:
|
|
||||||
print("Date range satisfied. Stopping.", file=sys.stderr)
|
|
||||||
break
|
|
||||||
|
|
||||||
pages = data.get("pageTotal", 0)
|
pages = data.get("pageTotal", 0)
|
||||||
|
|
||||||
if page_num >= pages:
|
if page_num >= pages:
|
||||||
@ -320,14 +306,15 @@ async def run():
|
|||||||
|
|
||||||
print(f"\nNew results fetched: {len(new_results)}", file=sys.stderr)
|
print(f"\nNew results fetched: {len(new_results)}", file=sys.stderr)
|
||||||
|
|
||||||
# Deduplicate if in incremental mode
|
# Always deduplicate against existing data
|
||||||
if args.incremental and existing_data:
|
if existing_data:
|
||||||
new_results = deduplicate_results(new_results, existing_data)
|
new_results = deduplicate_results(new_results, existing_data)
|
||||||
print(f"After deduplication: {len(new_results)}", file=sys.stderr)
|
print(f"After deduplication: {len(new_results)}", file=sys.stderr)
|
||||||
|
|
||||||
# Write to CSV: new data first, then existing data
|
# Write to CSV: new data first, then existing data
|
||||||
with open(args.output, "w", newline="", encoding="utf-8") as csv_file:
|
# Use utf-8-sig to add BOM for Excel to auto-detect encoding and delimiter
|
||||||
csv_writer = csv.writer(csv_file)
|
with open(args.output, "w", newline="", encoding="utf-8-sig") as csv_file:
|
||||||
|
csv_writer = csv.writer(csv_file, delimiter=",", quoting=csv.QUOTE_MINIMAL)
|
||||||
csv_writer.writerow(["项目标题", "发布时间", "详细链接"])
|
csv_writer.writerow(["项目标题", "发布时间", "详细链接"])
|
||||||
|
|
||||||
# Write new results first (newer data)
|
# Write new results first (newer data)
|
||||||
Loading…
Reference in New Issue
Block a user