split:分割文本文件

在数据处理时有时需要将一个大的数据集分为几个部分,交给不同机器或不同账号进行处理,之前我都是用sed或vi进行分割,没想到linux有一个专门这样的工具进行处理:split。

split的用法也非常简单,可以按行分割也可以按大小分割。

$ split --help
Usage: split [OPTION]... [FILE [PREFIX]]
Output pieces of FILE to PREFIXaa, PREFIXab, ...;
default size is 1000 lines, and default PREFIX is 'x'.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of records per output file
  -d                      use numeric suffixes starting at 0, not alphabetic
      --numeric-suffixes[=FROM]  same as -d, but allow setting the start value
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines/records per output file
  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below
  -t, --separator=SEP     use SEP instead of newline as the record separator;
                            '\0' (zero) specifies the NUL character
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose           print a diagnostic just before each
                            output file is opened
      --help     display this help and exit
      --version  output version information and exit

The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).

CHUNKS may be:
  N       split into N files based on size of input
  K/N     output Kth of N to stdout
  l/N     split into N files without splitting lines/records
  l/K/N   output Kth of N to stdout without splitting lines/records
  r/N     like 'l' but use round robin distribution
  r/K/N   likewise but only output Kth of N to stdout

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Full documentation at: <http://www.gnu.org/software/coreutils/split>
or available locally via: info '(coreutils) split invocation'

简单而言,我将large.csv按照每个2000行进行分割,输入文件名按数字形式命名,则可以:

split -l 2000 -d large.csv


已发布

分类

来自

标签:

评论

《 “split:分割文本文件” 》 有 13 条评论

  1. sans ordonnance kamagra prescrire nato medicament

    pas de r x nécessaire pour acheter kamagra

  2. buy cheap enclomiphene buy virginia

    enclomiphene price comparison

  3. buy androxal cheap from usa

    purchase androxal singapore where to buy

  4. buying dutasteride cheap buy online no prescription

    discount dutasteride toronto canada

  5. how to buy flexeril cyclobenzaprine uk suppliers

    flexeril cyclobenzaprine for women and men who wants to get pregnant

  6. how to get it up without gabapentin

    discount gabapentin canada mail order

  7. how buy fildena in australia

    get fildena generic sale

  8. get staxyn cost insurance

    generic staxyn online canadian pharmacy

  9. buy cheap itraconazole ireland over the counter

    how to order itraconazole cost uk

  10. order avodart cheap from usa

    get avodart purchase online from canada

  11. how to buy rifaximin uk no prescription

    ordering rifaximin generic free shipping

  12. comprar xifaxan generico

    get xifaxan cheap canada

  13. ト催ュna kamagra nテュzkテゥ ceny v usa

    ト催ュnナ。tina kamagra pilulky na prodej v usa

回复 cheapest buy enclomiphene cheap europe 取消回复

您的邮箱地址不会被公开。 必填项已用 * 标注