PINE64
ROCKPRO64 with 10GbE NICs - Printable Version

+- PINE64 (https://forum.pine64.org)
+-- Forum: ROCKPRO64 (https://forum.pine64.org/forumdisplay.php?fid=98)
+--- Forum: General Discussion on ROCKPRO64 (https://forum.pine64.org/forumdisplay.php?fid=99)
+--- Thread: ROCKPRO64 with 10GbE NICs (/showthread.php?tid=6964)

Pages: 1 2


ROCKPRO64 with 10GbE NICs - H.HSEL - 12-17-2018

I bought a RockPro64 and am trying to build a budget 10GbE storage.
I plugged an AQC107 based ASUS 10GbE card (XG-C100C) into its PCIEx4 adapter and Samsung 860EVO 1TB via USBC-SATA conversion cable (lsusb shows "ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge"), and no problem mounting it.

The image I chose this time is stretch-minimal-rockpro64-0.7.11-1075-arm64 and the driver for the card is Atlantic driver (Atlantic-2.0.15.0).
I downloaded the driver from Aquantia's website and had no problem building and installing.
After 'modprobe atlantic' the board immediately recognized the card as enp1s0. Then its idle power consumption (with all devices active) is around 10W, whose almost half the 10GbE card accounts for.

After changing its MTU from 1500 (default) to 9000, I performed iperf and I got a result around 2.70Gbps, though it seems that the result is not stable, sometimes the value going down to 2.30Gbps or lower.
I don't know why but manually assining iperf processes to a specific core (taskset -c 5 iperf -s, for example) seems to offer better and stable results that is 3.00Gbps.

At this point the effective speed as a NAS (sharing the SSD through SMB and mount it from a Windows client, transferring large data manually and observe how much it takes to complete) is approx. 330MB/s for read and write, and 4k random read/write is 7MB/s and 4KB Q8T8 and 4K Q32T1 for read are both 117MB/s, while it's both 100MB/s for write, says CrystalDiskMark 6.0.2. The peak power consumption is just 15.0W, so it's very power-efficient compared to x86-based 10GbE systems.

This is a great gain to have, considering the limit ROCK64 has.
Currently I have an impression that RockPro64 is an ideal solution for HDD-based NAS, and it would be fun if I would buy more boards and build a distributed system. But it's a little slow for SSD-based NAS, especially when looking at random read/write values.

I wonder if there is anyone who have had more comfortable results with 10GbE NICs and SSDs, to know whether I can get more out of this board or I've reached the limit.
Launching iperf multithreaded (-P6) greatly increased total bandwidth from 3Gbps at most to over 9Gbps, but after all SMB is not multithreaded and what matters most is achieving higher single-thread performance for personal use.
I admit there is room for optimization, so I will post if I have a significant improvement in performance.


RE: ROCKPRO64 with 10GbE NICs - xalius - 12-17-2018

Hi and thanks for the report, I have tested my RockPro64 with a Mellanox 10GbE card and came to similar conclusions, while you can use the network bandwidth, you need a multi-threaded/high performance server process as well, otherwise you will get limited by the single core A72 throughput.... 330MB/s linear doesnt sound too bad, with the theoretical limit for the USB3/UAS SATA stuff being 390MB/s...


RE: ROCKPRO64 with 10GbE NICs - fosf0r - 12-17-2018

Additional SMB info that I always post:

Don't forget "min protocol = SMB2" in your Samba config (and restart smbd), because otherwise it allows for SMB1, and Windows could accidentally select SMB1 instead of SMB2 unless you ALSO configured your Windows client specifically NOT to use SMB1.
On the Windows side: https://support.microsoft.com/en-us/help/2696547/how-to-detect-enable-and-disable-smbv1-smbv2-and-smbv3-in-windows-and
It's best to specify on both ends that you want SMB2 so there's no way it can mess up.

[ If you've already mitigated Windows for 'wannacry', then you should have SMB1 off and SMB2 on already. ]

( In the future, we'll want SMB3, but Samba's SMB3 support is limited. You could try 3 if you are able, but I personally haven't tested Samba's SMB3 since I can't exceed SMB2 performance. )


RE: ROCKPRO64 with 10GbE NICs - H.HSEL - 12-18-2018

I have already done some quick performance tuning, including specifying SMB protocols. After observing CPU usage gets 100% when performing sequential read/write, it seems I've reached limit as xalius mentioned. And yes, considering SSDs are converted via SATA-USB cable, sequential read/write speed is actually quite good, though 4k speeds are unproportionally slow (7MB/s is extremely slow) compared to sequential ones, and CPU usage does not get 100%.

Actually, performing random 4K benchmark with fio command on the SSD directly (of course connected to RockPro64 via SATA-USB cable) showed that it's capable of at lest 5500IOPS, namely 22MB/s for both read and write. I'll continue to find out whether I can have some improvement on this.

That said, the fact that you can get a NAS kit that is able to serve at over 300MB/s sequential speed well below 200USD (including 10GbE card price) is very impressive. It could serve near 10Gbps if the workload is highly optimized for multithreading, though what kind of workloads can get the most bandwidth out of the board remains to be seen.
In terms of network storage, if one need more sequential performance, it might be a time to consider buying Armada 8040 (Quad A72 and native SATA, 10GbE support) based boards or just grabbing a cheap x86 ITX boards and Pentium or something. But then the total system cost (excluding storage) suddenly got well over 300USD, and it just starts to feel no hobby anymore.


RE: ROCKPRO64 with 10GbE NICs - ddimension - 12-18-2018

(12-17-2018, 06:44 AM)H.HSEL Wrote: I bought a RockPro64 and am trying to build a budget 10GbE storage.
After changing its MTU from 1500 (default) to 9000, I performed iperf and I got a result around 2.70Gbps, though it seems that the result is not stable, sometimes the value going down to 2.30Gbps or lower.
I don't know why but manually assining iperf processes to a specific core (taskset -c 5 iperf -s, for example) seems to offer better and stable results that is 3.00Gbps.

Hi!

I also tested aquantia cards and came to the result, that the driver has poor performance, i.e. missing offloading capabilities. I also have an tehuti tn4xxx network card. This one gave me about 9GBit/s single threaded iperf3 performance, a lot better. Even vlan offloading works.

If you want to give it a try, you have define RX_REUSE_PAGES in tn40.h


RE: ROCKPRO64 with 10GbE NICs - H.HSEL - 12-19-2018

Thanks for the info, definitely it's worth considering, though I don't have one currently.
Now I've found out that (user configurable) offloading features are all on by default (by looking at the config header file specified in README) for Aquantia drivers.

I changed the card to Intel X550-T2 based card, installed ixgbe driver and performed the same test to confirm that it can serve at more decent speed that is 4.76Gbps, and CPU usage was about 77%, not 100% that was observed on the Aquantia one.
Nominal performance (namely CryltalDiskMark scores) also improved by about 10% (except 4KB Q1T1 random R/W), though perceived performance was almost the same (Windows file transfer says actual speed is 330MB/s).


RE: ROCKPRO64 with 10GbE NICs - gokuz - 12-20-2018

(12-19-2018, 06:45 AM)H.HSEL Wrote: Intel X550-T2 

This is perfect for pfsense since its Intel. 
Any chance you could try it, as a 2 port switch perhaps connected to 2 computers with ramdisk each. 

Interested to see it achieving 10gbps as its only routing the bandwidth, not writing anything on rockpro64. If this works, it could easily be paired with MikroTik CRS305-1G-4S+IN = $100+. 

Meaning,

  1. 10gbps ISP WAN > Rockpro64 (10gbe WAN Router) > MikroTik (4 port 10gbe switch)
  2. $200-250 10gbe router config 
  3. Low power, low noise potentially fanless 10gbe router



RE: ROCKPRO64 with 10GbE NICs - Ryan - 01-28-2019

How did you guys make Intel X550-T2 or Mellanox 10GbE card work on Rockpro64?

I tried both with ayufan's release and Linux always boot fail (kept crash and reset board) once Intel/Mellanox NIC plugged in.
Armbian's release can boot to Desktop but "lspci" command always crash.

Which Linux image were you using, and is there any specific setting/configuration required?


RE: ROCKPRO64 with 10GbE NICs - Ryan - 01-31-2019

I found the following CentOS 7 image for Rockpro64 can work with 10GbE NIC adapters.
https://github.com/Project31/centos-pine64/releases/download/v7.4.1708-v5.56/centos7-rock64pro.img.xz

But it appears all PCIe adapters (inlcuding NVMe SSD) can only link-up at PCIe Gen2 x2, as showing below.
(snip)
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 (prog-if 02 [NVM Express])
(snip)
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM not supported, Exit Latency L0s unlimited, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
(snip)
And I tested again with Ayufan's 0.7.11-1075 image, NVMe SSD is linked at Gen2 x4 width, but 10GbE NIC does not work with this image.


RE: ROCKPRO64 with 10GbE NICs - ddimension - 02-24-2019

(12-18-2018, 04:24 PM)ddimension Wrote:
(12-17-2018, 06:44 AM)H.HSEL Wrote: I bought a RockPro64 and am trying to build a budget 10GbE storage.
After changing its MTU from 1500 (default) to 9000, I performed iperf and I got a result around 2.70Gbps, though it seems that the result is not stable, sometimes the value going down to 2.30Gbps or lower.
I don't know why but manually assining iperf processes to a specific core (taskset -c 5 iperf -s, for example) seems to offer better and stable results that is 3.00Gbps.

Hi!

I also tested aquantia cards and came to the result, that the driver has poor performance, i.e. missing offloading capabilities. I also have an tehuti tn4xxx network card. This one gave me about 9GBit/s single threaded iperf3 performance, a lot better. Even vlan offloading works.

If you want to give it a try, you have define RX_REUSE_PAGES in tn40.h

After long testing I experienced OOM on higher loads. The problem is the ARM memory management and DMA capabilities.
You should set no coherent_pool in kernel args and provide a big area for contingous memory (which will be used by DMA) like:
cma=512M

Perhaps a smaller area possible.

BTW, I use 4 lanes:
Code:
01:00.0 Ethernet controller: Tehuti Networks Ltd. TN9510 10GBase-T/NBASE-T Ethernet Adapter
    Subsystem: Tehuti Networks Ltd. Ethernet Adapter
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 233
    Region 0: Memory at fa000000 (64-bit, prefetchable) [size=64K]
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000fee30040  Data: 0000
    Capabilities: [78] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [80] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <2us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
        LnkCap:    Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range A, TimeoutDis+, LTR-, OBFF Not Supported
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
            Status:    NegoPending- InProgress-
    Kernel driver in use: tn40xx
    Kernel modules: tn40xx