lines
and slice
Table of Contents
lines [ download ]
'lines' is a tool for extracting certain lines from a (large) data file. It takes as input a file containing the numbers of the lines which you wish to extract. Those line numbers may not necessarily be in increasing order. If that is the case, then simple solutions in sed/perl etc will be very inefficient as they will make many passes over the file. R users might read the data into R and then use R's array subsetting facilities. But depending on the size of the file and the amount of RAM available, that can be frustrating / impossible. 'lines' performs this task efficiently (with a single pass through the file), by storing the offsets of each new-line that it passes, and jumping back to them if necessary. It is written in C. To compile, uncompress archive, extract files, enter directory 'lines' and type 'make'.
Example usage
## Desired line numbers are held in the file `wantlines' $ cat wantlines 100 1 50 75 2 25 17 ## Here's the first few lines of the file from which we're requesting the lines. $ head file line 001 line 002 line 003 line 004 line 005 line 006 line 007 line 008 line 009 line 010 ## use `lines' to print out the lines $ lines -f wantlines < file line 100 line 001 line 050 line 075 line 002 line 025 line 017 ## But NB you can't use a pipe unless the line numbers are increasing! (because you can't use stdio 'seek' with a pipe) $ cat file | lines -f wantlines line 100 fseek error (Did you use a pipe and ask for non-monotonically increasing lines? You can't use a pipe in that case.) :