Collin Gray | High throughput shuffling

High throughput shuffling

Naïve approach with np’s memmap
why it doesn’t work
- we don’t make use of the kinds of writes we are performing, that is sequential appends
- stride matters here too, you want a single tensor to always be contiguous
rewrite closer to hardware
working around page size
- basically, try to match writes to pages, both in offset and size
madvise