lines and slice

Table of Contents

lines [ download ]

'lines' is a tool for extracting certain lines from a (large) data file. It takes as input a file containing the numbers of the lines which you wish to extract. Those line numbers may not necessarily be in increasing order. If that is the case, then simple solutions in sed/perl etc will be very inefficient as they will make many passes over the file. R users might read the data into R and then use R's array subsetting facilities. But depending on the size of the file and the amount of RAM available, that can be frustrating / impossible. 'lines' performs this task efficiently (with a single pass through the file), by storing the offsets of each new-line that it passes, and jumping back to them if necessary. It is written in C. To compile, uncompress archive, extract files, enter directory 'lines' and type 'make'.

Example usage

## Desired line numbers are held in the file `wantlines'
$ cat wantlines
## Here's the first few lines of the file from which we're requesting the lines.
$ head file
line 001
line 002
line 003
line 004
line 005
line 006
line 007
line 008
line 009
line 010
## use `lines' to print out the lines
$ lines -f wantlines < file 
line 100
line 001
line 050
line 075
line 002
line 025
line 017
## But NB you can't use a pipe unless the line numbers are increasing! (because you can't use stdio 'seek' with a pipe)
$ cat file | lines -f wantlines
line 100
fseek error (Did you use a pipe and ask for non-monotonically increasing lines? You can't use a pipe in that case.) :