06-09-2019, 06:07 PM
I'm hoping someone can help with a problem I have with a new NAS build using OMV.
I have a RockPro 64, Pine64 NAS case, two ST8000DM004-2CX1 drives, 16 GB eMMC boot, Pine 5A supply. I am running Open Media Vault stretch-openmediavault-rockpro64-0.7.9-1067-armhf.img.xz (Linux rockpro64 4.4.132-1075-rockchip-ayufan-ga83beded8524 #1 SMP Thu Jul 26 08:22:22 UTC 2018 aarch64 GNU/Linux) with all updates listed in OMV applied.
Sometimes during boot, the SATA PCI board does not get power and thus OMV sees no data drives. I have confirmed with a DVM there is no 12VCC on the SATA controller chip pin 48 when this happens. About half the time it boots up correctly and OMV seems to work OK. I get ~100 MB/s over gigabit Ethernet from a PC to the NAS, so the performance is good. I have checked the +12V input to the RockPro64 and it ramps up monotonically when the power supply is connected, and only sags to 11.8V while the drives are spinning up.
When it's working, lspci -vvv says:
00:00.0 PCI bridge: Device 1d87:0100 (prog-if 00 [Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort+ <MAbort+ >SERR+ <PERR+ INTx-
Latency: 0
Interrupt: pin A routed to IRQ 238
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00000000-00000fff
Memory behind bridge: fa000000-fa0fffff
Prefetchable memory behind bridge: 00000000-000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME+
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee30040 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000008
Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L1, Exit Latency L0s <256ns, L1 <8us
ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd Off, Power+ Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL+ CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via message ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [274 v1] Transaction Processing Hints
Interrupt vector mode supported
Device specific mode supported
Steering table in TPH capability structure
Kernel driver in use: pcieport
01:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01) (prog-if 01 [AHCI 1.0])
Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 239
Region 0: I/O ports at <unassigned> [disabled]
Region 1: I/O ports at <unassigned> [disabled]
Region 2: I/O ports at <unassigned> [disabled]
Region 3: I/O ports at <unassigned> [disabled]
Region 4: I/O ports at <unassigned> [disabled]
Region 5: Memory at fa010000 (32-bit, non-prefetchable) [size=512]
[virtual] Expansion ROM at fa000000 [disabled] [size=64K]
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee30040 Data: 0000
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM not supported, Exit Latency L0s <512ns, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Kernel driver in use: ahci
When it's not working lspci -vvv says nothing.
Looking at dmesg in both cases, I see suspicious entries about power configuration, but they are the same in both the working and no PCI power case:
0.502678] reg-fixed-voltage vcc3v3-pcie-regulator: Looking up vin-supply from device tree
[ 0.502716] vcc3v3_pcie: supplied by dc_12v
[ 0.502740] dc_12v: could not add device link regulator.2 err -2
[ 0.502781] vcc3v3_pcie: 3300 mV
[ 0.502937] reg-fixed-voltage vcc3v3-pcie-regulator: vcc3v3_pcie supplying 3300000uV
[ 0.502989] of_get_named_gpiod_flags: can't parse 'gpio' property of node '/vcc1v8-s0[0]'
[ 0.503005] vcc1v8_s0: 1800 mV
[ 0.503152] reg-fixed-voltage vcc1v8-s0: vcc1v8_s0 supplying 1800000uV
[ 2.053627] vcc3v3_pcie: disabling
The key difference in the dmesg outputs is the one that has no PCI power says:
rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!
rockchip-pcie: probe of f8000000.pcie failed with error -110
The one that works has a long section of status messages that begins:
PCI host bridge /pcie@f8000000 ranges:
MEM 0xfa000000..0xfbdfffff -> 0xfa000000
IO 0xfbe00000..0xfbefffff -> 0xfbe00000
rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
This failure mode is easily observable during boot. If there is power to the SATA controller card, you can see it's LED blink through the case grille. If there is no power to the board, no LED of course.
Does anyone have any ideas about what I could do to troubleshoot this? I don't want a file server that only boots 1/2 the time.
I have a RockPro 64, Pine64 NAS case, two ST8000DM004-2CX1 drives, 16 GB eMMC boot, Pine 5A supply. I am running Open Media Vault stretch-openmediavault-rockpro64-0.7.9-1067-armhf.img.xz (Linux rockpro64 4.4.132-1075-rockchip-ayufan-ga83beded8524 #1 SMP Thu Jul 26 08:22:22 UTC 2018 aarch64 GNU/Linux) with all updates listed in OMV applied.
Sometimes during boot, the SATA PCI board does not get power and thus OMV sees no data drives. I have confirmed with a DVM there is no 12VCC on the SATA controller chip pin 48 when this happens. About half the time it boots up correctly and OMV seems to work OK. I get ~100 MB/s over gigabit Ethernet from a PC to the NAS, so the performance is good. I have checked the +12V input to the RockPro64 and it ramps up monotonically when the power supply is connected, and only sags to 11.8V while the drives are spinning up.
When it's working, lspci -vvv says:
00:00.0 PCI bridge: Device 1d87:0100 (prog-if 00 [Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort+ <MAbort+ >SERR+ <PERR+ INTx-
Latency: 0
Interrupt: pin A routed to IRQ 238
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00000000-00000fff
Memory behind bridge: fa000000-fa0fffff
Prefetchable memory behind bridge: 00000000-000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME+
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee30040 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000008
Capabilities: [c0] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L1, Exit Latency L0s <256ns, L1 <8us
ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd Off, Power+ Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL+ CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Via message ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [274 v1] Transaction Processing Hints
Interrupt vector mode supported
Device specific mode supported
Steering table in TPH capability structure
Kernel driver in use: pcieport
01:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01) (prog-if 01 [AHCI 1.0])
Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 239
Region 0: I/O ports at <unassigned> [disabled]
Region 1: I/O ports at <unassigned> [disabled]
Region 2: I/O ports at <unassigned> [disabled]
Region 3: I/O ports at <unassigned> [disabled]
Region 4: I/O ports at <unassigned> [disabled]
Region 5: Memory at fa010000 (32-bit, non-prefetchable) [size=512]
[virtual] Expansion ROM at fa000000 [disabled] [size=64K]
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee30040 Data: 0000
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM not supported, Exit Latency L0s <512ns, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Kernel driver in use: ahci
When it's not working lspci -vvv says nothing.
Looking at dmesg in both cases, I see suspicious entries about power configuration, but they are the same in both the working and no PCI power case:
0.502678] reg-fixed-voltage vcc3v3-pcie-regulator: Looking up vin-supply from device tree
[ 0.502716] vcc3v3_pcie: supplied by dc_12v
[ 0.502740] dc_12v: could not add device link regulator.2 err -2
[ 0.502781] vcc3v3_pcie: 3300 mV
[ 0.502937] reg-fixed-voltage vcc3v3-pcie-regulator: vcc3v3_pcie supplying 3300000uV
[ 0.502989] of_get_named_gpiod_flags: can't parse 'gpio' property of node '/vcc1v8-s0[0]'
[ 0.503005] vcc1v8_s0: 1800 mV
[ 0.503152] reg-fixed-voltage vcc1v8-s0: vcc1v8_s0 supplying 1800000uV
[ 2.053627] vcc3v3_pcie: disabling
The key difference in the dmesg outputs is the one that has no PCI power says:
rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!
rockchip-pcie: probe of f8000000.pcie failed with error -110
The one that works has a long section of status messages that begins:
PCI host bridge /pcie@f8000000 ranges:
MEM 0xfa000000..0xfbdfffff -> 0xfa000000
IO 0xfbe00000..0xfbefffff -> 0xfbe00000
rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00
This failure mode is easily observable during boot. If there is power to the SATA controller card, you can see it's LED blink through the case grille. If there is no power to the board, no LED of course.
Does anyone have any ideas about what I could do to troubleshoot this? I don't want a file server that only boots 1/2 the time.