each datafile has data for ~18 variables (position data for ~18 components). And so I am calculating 6 statistics for each of 18 components over all data files. Also, some data files can be small (~kB) and some large (~MBs).

Maybe I'm losing something, but maybe (again...) there is a "little" amount of data involved for O(n) calculations... the I/O would be the bottleneck anyway. If I'm making the math correctly, and depending on the "current" MBs the files have, aprox. 20xMB is still MB or at most a little amount of GB... did you consider to read all the data at once and make calculations (OpenMP threaded, of course) directly on arrays?

