SQL问题：array select子查询优化

在我们的数据库中，灌入了上海出租车数据，大概有36亿条数据，开始主要的目的是测试MPP数据库性能，也顺手开展了一些出租车问题的研究。

在从出租车点数据转化为线数据过程中，遇到了一个奇怪的sql优化问题。

点数据表为taxi_point：

ius=# \d taxi_point
  Append-Only Columnar Table "public.taxi_point"
 Column  |            Type             | Modifiers 
---------+-----------------------------+-----------
 tid     | integer                     | not null
 isempty | smallint                    | 
 lon     | real                        | 
 lat     | real                        | 
 gpstime | timestamp without time zone | 
Checksum: t
Distributed randomly

包含每段行程的表taxi_trip：

ius=# \d taxi_trip
               Table "public.taxi_trip"
   Column    |            Type             | Modifiers 
-------------+-----------------------------+-----------
 tid         | integer                     | not null
 isempty     | smallint                    | not null
 pickup_lon  | real                        | not null
 pickup_lat  | real                        | not null
 dropoff_lon | real                        | not null
 dropoff_lat | real                        | not null
 distance    | real                        | 
 pickup_ts   | timestamp without time zone | not null
 dropoff_ts  | timestamp without time zone | not null
 geom        | geometry(LineString,4326)   | 
Indexes:
    "taxi_trip_pkey" PRIMARY KEY, btree (tid, isempty, pickup_ts)
Distributed by: (tid, isempty, pickup_ts)

两个表对应的记录数量大致为千万级和十亿级别：

ius=# select count(*) from taxi_point;
   count    
------------
 1480896950
(1 row)

ius=# select count(*) from taxi_trip;
  count   
----------
 25110980
(1 row)

现在，想从taxi_point表中更新每段行程的线数据到taxi_trip表中的geom字段，直观上写出来SQL语句：

update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=tp.tid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326)

没想到，这个sql执行非常慢，三天都没有执行完。于是就想进一步优化，看看能不能提高速度。首先在一个点上的数据进行测试：

update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=tp.tid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid=30050;

发现这个语句也很慢，搜索了一下tid为30050的记录数：

ius=# select count(*) from taxi_trip where tid=30050;
 count 
-------
  1042
(1 row)
ius=# select count(*) from taxi_point where tid=30050;
 count 
-------
 92090
(1 row)

两个表中的相关数据都不大，为啥执行速度慢呢？再进行测试，将限定条件前置，看看是不是array select导致的问题：

update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=30050 and t.tid=tp.tid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid=30050;

有点诧异的是，这个SQL语句执行的速度很快！说明最外层的限定条件在查询优化时并没有传递到array select子查询语句中。那要解决此问题，就需要使用自定义函数来实现了，又遇到了坑：

create or replace function update_trip_geom() returns void as $$
declare
curtid integer;
usql text;
begin
  for curtid in select distinct tid from taxi_trip order by tid 
  loop
     raise notice 'Dealing with tid: %',curtid;
     update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=curtid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid=curtid;
     
  end loop;
end;
$$ LANGUAGE plpgsql;

执行这个函数，发现又回归了龟速级别，说明tid参数并没有进行查询优化，于是进一步修改，尝试使用文本sql来代替执行：

create or replace function update_trip_geom() returns void as $$
declare
curtid integer;
usql text;
begin
  for curtid in select distinct tid from taxi_trip order by tid 
  loop
     raise notice 'Dealing with tid: %',curtid;
     usql := 'update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid='||curtid||' and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid='||curtid||';'; 
     execute usql; 
  end loop;
end;
$$ LANGUAGE plpgsql;

终于回归到正常速度了。

已发布

2018-06-07

分类

PostGIS, postgresql

来自

wlx

标签：

array, greenplum, PostGIS, sql

《 “SQL问题：array select子查询优化” 》有 13 条评论

online order enclomiphene no prescription overnight delivery

2025-08-17

buy enclomiphene generic online mastercard

buy cheap enclomiphene uk cheap purchase buy

回复
commander kamagra sans ordonnance

2025-08-17

kamagra sans prescription médicale

nuit kamagra ups livraison

回复
how to order androxal cheap generic uk

2025-08-17

purchase androxal buy germany

androxal over night

回复
dutasteride prices walmart

2025-08-17

order dutasteride cheap no prescription

buy cheap dutasteride uk order

回复
flexeril cyclobenzaprine no doctor prescription

2025-08-17

how to buy flexeril cyclobenzaprine buy adelaide

order 5 pills of flexeril cyclobenzaprine

回复
generic gabapentin from canada

2025-08-17

gabapentin fedex delivery

buy cheap gabapentin buy uk no prescription

回复
how to buy fildena canada with no prescription

2025-08-17

how to get fildena prescription

buying fildena australia buy online

回复
how to order staxyn generic pricing

2025-08-18

how to buy staxyn australia price

order staxyn buy online usa

回复
buy itraconazole purchase england

2025-08-18

discount itraconazole cheap alternatives

No perscription itraconazole

回复
online order avodart cheap canada pharmacy

2025-08-18

ordering avodart cheap usa

buy avodart generic good

回复
purchase rifaximin us pharmacies

2025-08-18

buy cheap rifaximin no prescription needed

cheap rifaximin cheap online no prescription

回复
discount xifaxan cost effectiveness

2025-08-18

buy xifaxan purchase from uk

where to buy xifaxan in the usa without a prescription

回复
obecné kamagra rychlé dodání online

2025-08-18

koupit kamagra ve velké británii

sleva kanadská lékárna kamagra

回复

SQL问题：array select子查询优化

评论

《 “SQL问题：array select子查询优化” 》 有 13 条评论

发表回复 取消回复

《 “SQL问题：array select子查询优化” 》有 13 条评论

发表回复取消回复