The idea:
This should be 3x slower:
This should be 3x faster:
All the latch pins and clock pins would be connected together, the serial branches would be connected as normal, but each branch would have its individual data pins. That way it could run multiple branches but it would only take the amount of time to run as running one branch.
Would this idea work? Is there any foreseeable problems? I am already working on the code.