lines and slice

Table of Contents

lines [ download ]

'lines' is a tool for extracting certain lines from a (large) data file. It takes as input a file containing the numbers of the lines which you wish to extract. Those line numbers may not necessarily be in increasing order. If that is the case, then simple solutions in sed/perl etc will be very inefficient as they will make many passes over the file. R users might read the data into R and then use R's array subsetting facilities. But depending on the size of the file and the amount of RAM available, that can be frustrating / impossible. 'lines' performs this task efficiently (with a single pass through the file), by storing the offsets of each new-line that it passes, and jumping back to them if necessary. It is written in C. To compile, uncompress archive, extract files, enter directory 'lines' and type 'make'.

Example usage

## Desired line numbers are held in the file `wantlines'
$ cat wantlines
100
1
50
75
2
25
17
## Here's the first few lines of the file from which we're requesting the lines.
$ head file
line 001
line 002
line 003
line 004
line 005
line 006
line 007
line 008
line 009
line 010
## use `lines' to print out the lines
$ lines -f wantlines < file 
line 100
line 001
line 050
line 075
line 002
line 025
line 017
## But NB you can't use a pipe unless the line numbers are increasing! (because you can't use stdio 'seek' with a pipe)
$ cat file | lines -f wantlines
line 100
fseek error (Did you use a pipe and ask for non-monotonically increasing lines? You can't use a pipe in that case.) :












slice [ download ]

This tool slices up tabular data. The line and column numbers that you want are specified in two separate files; if either is omitted then all lines/columns are printed out. The line- numbers must be in increasing order. See 'lines' below if they're not. The column delimiter is specified with the '-d' flag (defaults to a single space character). The '-v' flag inverts the sense of matching (i.e. rows and columns are printed out if they don't occur in the files). 'slice' is written in perl. Under linux and Mac OSX, just download it and make sure it's executable.

Example usage

$ cat data ## some tabular data
11 12 13
21 22 23
31 32 33
41 42 43
$ cat line-nums ## a file containing desired line numbers, one per line
1
3
$ ./slice -l line-nums < data ## extract the lines
11 12 13
31 32 33
$ cat col-nums ## desired column numbers
2
3
$ ./slice -c col-nums < data ## extract the columns
12 13
22 23
32 33
42 43
$ ./slice -l line-nums -c col-nums < data ## extract their intersection
12 13
32 33
$ ./slice -v -l line-nums -c col-nums < data ## extract everything not in their union
21
41