Loads all the offsets & timestamps @ start to share to worker proceses.
From
14609247 function calls (14608852 primitive calls) in 118.278 CPU seconds
to
12232301 function calls (12231906 primitive calls) in 75.825 CPU seconds
Notes:
* Currently only works with -p 1
* Caching is mostly compatible with existing caches, but not completly.
This needs more testing and more code reviews
* There are probably many code paths that will throw exceptions.
* Not ready for general use yet, but is OK for testing