SQL问题:array select子查询优化

在我们的数据库中,灌入了上海出租车数据,大概有36亿条数据,开始主要的目的是测试MPP数据库性能,也顺手开展了一些出租车问题的研究。

在从出租车点数据转化为线数据过程中,遇到了一个奇怪的sql优化问题。

点数据表为taxi_point:

ius=# \d taxi_point
  Append-Only Columnar Table "public.taxi_point"
 Column  |            Type             | Modifiers 
---------+-----------------------------+-----------
 tid     | integer                     | not null
 isempty | smallint                    | 
 lon     | real                        | 
 lat     | real                        | 
 gpstime | timestamp without time zone | 
Checksum: t
Distributed randomly

包含每段行程的表taxi_trip:

ius=# \d taxi_trip
               Table "public.taxi_trip"
   Column    |            Type             | Modifiers 
-------------+-----------------------------+-----------
 tid         | integer                     | not null
 isempty     | smallint                    | not null
 pickup_lon  | real                        | not null
 pickup_lat  | real                        | not null
 dropoff_lon | real                        | not null
 dropoff_lat | real                        | not null
 distance    | real                        | 
 pickup_ts   | timestamp without time zone | not null
 dropoff_ts  | timestamp without time zone | not null
 geom        | geometry(LineString,4326)   | 
Indexes:
    "taxi_trip_pkey" PRIMARY KEY, btree (tid, isempty, pickup_ts)
Distributed by: (tid, isempty, pickup_ts)

两个表对应的记录数量大致为千万级和十亿级别:

ius=# select count(*) from taxi_point;
   count    
------------
 1480896950
(1 row)

ius=# select count(*) from taxi_trip;
  count   
----------
 25110980
(1 row)

现在,想从taxi_point表中更新每段行程的线数据到taxi_trip表中的geom字段,直观上写出来SQL语句:

update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=tp.tid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326)

没想到,这个sql执行非常慢,三天都没有执行完。于是就想进一步优化,看看能不能提高速度。首先在一个点上的数据进行测试:

update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=tp.tid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid=30050;

发现这个语句也很慢,搜索了一下tid为30050的记录数:

ius=# select count(*) from taxi_trip where tid=30050;
 count 
-------
  1042
(1 row)
ius=# select count(*) from taxi_point where tid=30050;
 count 
-------
 92090
(1 row)

两个表中的相关数据都不大,为啥执行速度慢呢?再进行测试,将限定条件前置,看看是不是array select导致的问题:

update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=30050 and t.tid=tp.tid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid=30050;

有点诧异的是,这个SQL语句执行的速度很快!说明最外层的限定条件在查询优化时并没有传递到array select子查询语句中。那要解决此问题,就需要使用自定义函数来实现了,又遇到了坑:

create or replace function update_trip_geom() returns void as $$
declare
curtid integer;
usql text;
begin
  for curtid in select distinct tid from taxi_trip order by tid 
  loop
     raise notice 'Dealing with tid: %',curtid;
     update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid=curtid and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid=curtid;
     
  end loop;
end;
$$ LANGUAGE plpgsql;

执行这个函数,发现又回归了龟速级别,说明tid参数并没有进行查询优化,于是进一步修改,尝试使用文本sql来代替执行:

create or replace function update_trip_geom() returns void as $$
declare
curtid integer;
usql text;
begin
  for curtid in select distinct tid from taxi_trip order by tid 
  loop
     raise notice 'Dealing with tid: %',curtid;
     usql := 'update taxi_trip tp set geom=ST_SetSRID(ST_MakeLine(array(select ST_MakePoint(lon,lat) from taxi_point t where t.tid='||curtid||' and t.gpstime>=tp.pickup_ts and t.gpstime<=tp.dropoff_ts order by gpstime asc)),4326) where tid='||curtid||';'; 
     execute usql; 
  end loop;
end;
$$ LANGUAGE plpgsql;

终于回归到正常速度了。


已发布

分类

,

来自

标签:

评论

《 “SQL问题:array select子查询优化” 》 有 13 条评论

  1. buy enclomiphene generic online mastercard

    buy cheap enclomiphene uk cheap purchase buy

  2. kamagra sans prescription médicale

    nuit kamagra ups livraison

  3. purchase androxal buy germany

    androxal over night

  4. order dutasteride cheap no prescription

    buy cheap dutasteride uk order

  5. how to buy flexeril cyclobenzaprine buy adelaide

    order 5 pills of flexeril cyclobenzaprine

  6. gabapentin fedex delivery

    buy cheap gabapentin buy uk no prescription

  7. how to get fildena prescription

    buying fildena australia buy online

  8. how to buy staxyn australia price

    order staxyn buy online usa

  9. discount itraconazole cheap alternatives

    No perscription itraconazole

  10. ordering avodart cheap usa

    buy avodart generic good

  11. buy cheap rifaximin no prescription needed

    cheap rifaximin cheap online no prescription

  12. buy xifaxan purchase from uk

    where to buy xifaxan in the usa without a prescription

  13. koupit kamagra ve velké británii

    sleva kanadská lékárna kamagra

回复 obecné kamagra rychlé dodání online 取消回复

您的邮箱地址不会被公开。 必填项已用 * 标注