I figured it out, looks like all you need to do is set allwinner,tx-delay-ps = <200>; to a value greater then 200. Works fine and I get full throughput. I think that when I was moving both values around it would fix one and then break the other.
I am looking at what I need to do to get it in the OS builds.
I am looking at what I need to do to get it in the OS builds.