Combining files with pv

I have a directory with a very large file in it that is split up into pieces. I need this file to be in one big piece for a particular application (although that will be fixed soon).

In the meantime I wanted some kind of progress display to show me how long combining all of these files was going to take so I whipped this up:

#!/usr/bin/env bash

cat $(ls blocks/blk*.dat | sort) | \
  pv -s $(($(du -c blocks | cut -f1 | tail -n 1)*512)) > timsblocks.dat

The files are the Bitcoin blockchain pieces if you're wondering. And what the script does is as follows:

Get the list of block files from ls and make sure they're sorted by name (I'm paranoid):

cat $(ls blocks/blk*.dat | sort)

Get the size of the blocks directory and get a final total of everything in there -c as the last output line:

du -c blocks

Cut everything off after the first value (similar to awk '{ print $1 }' but cooler):

cut -f1

Get just the total:

tail -n 1

Then the text in the parenthesis (...) is treated as a mathematical expression and multiplied by 512 because du reports sizes in 512-byte blocks and I need them in byte. Finally, that's passed to the -s switch of pv to tell it how big the data is. And now my output looks like this:

> ./combine.sh
 118GiB 0:10:07 [ 216MiB/s] [==========>                       ] 33% ETA 0:19:59

Much better than guessing when it'll be done.