PCIE SATA controller ASM1062 problems or software
#11
Emask=0x10 probably can be deciphered by an enumerator of current linux source torvalds/linux/include/linux/libata.h:

Code:
enum ata_completion_errors {
AC_ERR_DEV = (1 << 0), /* device reported error */
AC_ERR_HSM = (1 << 1), /* host state machine violation */
AC_ERR_TIMEOUT = (1 << 2), /* timeout */
AC_ERR_MEDIA = (1 << 3), /* media error */
AC_ERR_ATA_BUS = (1 << 4), /* ATA bus error */
AC_ERR_HOST_BUS = (1 << 5), /* host bus error */
AC_ERR_SYSTEM = (1 << 6), /* system error */
AC_ERR_INVALID = (1 << 7), /* invalid argument */
AC_ERR_OTHER = (1 << 8), /* unknown */
AC_ERR_NODEV_HINT = (1 << 9), /* polling device detection hint */
AC_ERR_NCQ = (1 << 10), /* marker for offending NCQ qc */
};

0x10 means 4-th bit is set, ATA bus error. Not a device reported error, not an SBC failure, not a drive media failure, not an SBC bus (PCIe) error, not an unknown error. Exactly ATA bus error. This is mean that the SATA controller itself has reported a bus error. Looking forward...

SErr=0x4040000 means two bits of SATA error register are set, 18-th and 26-th.
Intel's SATA AHCI standard draft available says that the problem is:

  • A Comm Wake signal was detected by the Phy;
  • A COMINIT signal was received.

Can't say the signals are abnormal themself. They are not an errors like CRC or protocol failures. But these signals are not expected right now by libata code.
#12
(05-31-2019, 01:39 PM)Nikolay_Po Wrote: Emask=0x10 probably can be deciphered by an enumerator of current linux source torvalds/linux/include/linux/libata.h:

Code:
enum ata_completion_errors {
AC_ERR_DEV = (1 << 0), /* device reported error */
AC_ERR_HSM = (1 << 1), /* host state machine violation */
AC_ERR_TIMEOUT = (1 << 2), /* timeout */
AC_ERR_MEDIA = (1 << 3), /* media error */
AC_ERR_ATA_BUS = (1 << 4), /* ATA bus error */
AC_ERR_HOST_BUS = (1 << 5), /* host bus error */
AC_ERR_SYSTEM = (1 << 6), /* system error */
AC_ERR_INVALID = (1 << 7), /* invalid argument */
AC_ERR_OTHER = (1 << 8), /* unknown */
AC_ERR_NODEV_HINT = (1 << 9), /* polling device detection hint */
AC_ERR_NCQ = (1 << 10), /* marker for offending NCQ qc */
};

0x10 means 4-th bit is set, ATA bus error. Not a device reported error, not an SBC failure, not a drive media failure, not an SBC bus (PCIe) error, not an unknown error. Exactly ATA bus error. This is mean that the SATA controller itself has reported a bus error. Looking forward...

Hello Nikolay, 

Thank you for looking into it! I just thought i may have provided the wrong logs. It probably would be of better benefit to look at successful boot of omv and unsuccessful of omv, not successful armbian as i couldn't reproduce with it.
I finally found an attachment button on this forum.


Attached Files
.txt   OMVdmesgDrivesNotWorking.txt (Size: 172.82 KB / Downloads: 327)
.txt   OMVdmesgDrivesOK.txt (Size: 67.03 KB / Downloads: 632)
#13
The controller is SATA 3.0 capable, up to 6Gbps speed:

ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode

Good start:

Code:
...
[    2.892158] ahci 0000:01:00.0: version 3.0
[    2.892187] ahci 0000:01:00.0: enabling device (0000 -> 0002)
[    2.897587] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[    2.902878] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    2.908140] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs
[    2.912936] scsi host0: ahci
[    2.916136] scsi host1: ahci
[    2.919593] ata1: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010100 irq 239
[    2.926758] ata2: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010180 irq 239
[    3.028283] md: linear personality registered for level -1
[    3.045306] md: multipath personality registered for level -4
[    3.057861] md: raid0 personality registered for level 0
[    3.071080] md: raid1 personality registered for level 1
[    3.082245] async_tx: api initialized (async)
[    3.089432] md: raid6 personality registered for level 6
[    3.092450] md: raid5 personality registered for level 5
[    3.095382] md: raid4 personality registered for level 4
[    3.110664] md: raid10 personality registered for level 10
[    3.393371] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.404767] ata1.00: ATA-10: WDC WD40EFRX-68N32N0, 82.00A82, max UDMA/133
[    3.415416] ata1.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    3.427469] ata1.00: configured for UDMA/133
[    3.435658] scsi 0:0:0:0: Direct-Access     ATA      WDC WD40EFRX-68N 0A82 PQ: 0 ANSI: 5
[    3.442864] sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    3.448774] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    3.456362] sd 0:0:0:0: [sda] Write Protect is off
[    3.459313] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.460473] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.517821] EXT4-fs (mmcblk0p7): mounted filesystem with writeback data mode. Opts: (null)
[    3.889868]  sda: sda1
[    3.903345] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.918439] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.931274] ata2.00: ATA-10: WDC WD40EFRX-68N32N0, 82.00A82, max UDMA/133
[    3.943376] ata2.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    3.957284] ata2.00: configured for UDMA/133
[    3.966477] scsi 1:0:0:0: Direct-Access     ATA      WDC WD40EFRX-68N 0A82 PQ: 0 ANSI: 5
[    3.976806] sd 1:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    3.985707] sd 1:0:0:0: [sdb] 4096-byte physical blocks
[    3.994681] sd 1:0:0:0: [sdb] Write Protect is off
[    4.003161] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    4.003390] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
...
[    4.434918]  sdb: sdb1
[    4.457494] sd 1:0:0:0: [sdb] Attached SCSI disk
...
[    6.212890] systemd[1]: Started Journal Service.
[    6.264396] systemd-journald[333]: Received request to flush runtime journal from PID 1
...
[    7.139122] ata2.00: configured for UDMA/133
[    7.143767] ata2: EH complete
[    7.149571] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    7.188731] ata1.00: configured for UDMA/133
[    7.193347] ata1: EH complete
[    7.198255] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    7.454191] FAT-fs (mmcblk0p6): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[    7.481085] FAT-fs (mmcblk0p6): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    8.111178] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl
[    8.123701] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl
...


Bad start
Code:
...
[    2.603078] ahci 0000:01:00.0: version 3.0
[    2.603088] ahci 0000:01:00.0: enabling device (0000 -> 0002)
[    2.604895] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[    2.606642] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    2.608467] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs
[    2.611101] scsi host0: ahci
[    2.612919] scsi host1: ahci
[    2.614743] ata1: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010100 irq 239
[    2.616930] ata2: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010180 irq 239
[    2.875669] EXT4-fs (mmcblk0p7): mounted filesystem with writeback data mode. Opts: (null)
...
[    4.741549] systemd[1]: Started Journal Service.
[    4.786973] systemd-journald[337]: Received request to flush runtime journal from PID 1
[    4.822384] ata1: SATA link down (SStatus 1 SControl 300)
[    4.826507] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen t4
[    4.830837] ata1: irq_stat 0x00000040, connection status changed
[    4.834850] ata1: SError: { CommWake DevExch }
[    4.838688] ata1: hard resetting link
...
[    5.317967] FAT-fs (mmcblk0p6): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[    5.338342] FAT-fs (mmcblk0p6): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    7.048589] ata1: SATA link down (SStatus 1 SControl 300)
[    7.062679] ata1: EH complete
[    7.076432] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[    7.083094] ata1: irq_stat 0x00000040, connection status changed
[    7.089241] ata1: SError: { CommWake DevExch }
[    7.095259] ata1: limiting SATA link speed to 1.5 Gbps
[    7.101366] ata1: hard resetting link
[    9.311599] ata1: SATA link down (SStatus 1 SControl 310)
[    9.325961] ata1: EH complete
[    9.339887] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[    9.346637] ata1: irq_stat 0x00000040, connection status changed
[    9.352832] ata1: SError: { CommWake DevExch }
[    9.358896] ata1: limiting SATA link speed to 1.5 Gbps
[    9.365048] ata1: hard resetting link
...
repeat in a loop
#14
(05-31-2019, 02:13 PM)Nikolay_Po Wrote: The controller is SATA 3.0 capable, up to 6Gbps speed:

ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode

Good start:

Code:
...
[    2.892158] ahci 0000:01:00.0: version 3.0
[    2.892187] ahci 0000:01:00.0: enabling device (0000 -> 0002)
[    2.897587] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[    2.902878] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    2.908140] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs
[    2.912936] scsi host0: ahci
[    2.916136] scsi host1: ahci
[    2.919593] ata1: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010100 irq 239
[    2.926758] ata2: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010180 irq 239
[    3.028283] md: linear personality registered for level -1
[    3.045306] md: multipath personality registered for level -4
[    3.057861] md: raid0 personality registered for level 0
[    3.071080] md: raid1 personality registered for level 1
[    3.082245] async_tx: api initialized (async)
[    3.089432] md: raid6 personality registered for level 6
[    3.092450] md: raid5 personality registered for level 5
[    3.095382] md: raid4 personality registered for level 4
[    3.110664] md: raid10 personality registered for level 10
[    3.393371] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.404767] ata1.00: ATA-10: WDC WD40EFRX-68N32N0, 82.00A82, max UDMA/133
[    3.415416] ata1.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    3.427469] ata1.00: configured for UDMA/133
[    3.435658] scsi 0:0:0:0: Direct-Access     ATA      WDC WD40EFRX-68N 0A82 PQ: 0 ANSI: 5
[    3.442864] sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    3.448774] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    3.456362] sd 0:0:0:0: [sda] Write Protect is off
[    3.459313] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.460473] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.517821] EXT4-fs (mmcblk0p7): mounted filesystem with writeback data mode. Opts: (null)
[    3.889868]  sda: sda1
[    3.903345] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.918439] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.931274] ata2.00: ATA-10: WDC WD40EFRX-68N32N0, 82.00A82, max UDMA/133
[    3.943376] ata2.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    3.957284] ata2.00: configured for UDMA/133
[    3.966477] scsi 1:0:0:0: Direct-Access     ATA      WDC WD40EFRX-68N 0A82 PQ: 0 ANSI: 5
[    3.976806] sd 1:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    3.985707] sd 1:0:0:0: [sdb] 4096-byte physical blocks
[    3.994681] sd 1:0:0:0: [sdb] Write Protect is off
[    4.003161] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    4.003390] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
...
[    4.434918]  sdb: sdb1
[    4.457494] sd 1:0:0:0: [sdb] Attached SCSI disk
...
[    6.212890] systemd[1]: Started Journal Service.
[    6.264396] systemd-journald[333]: Received request to flush runtime journal from PID 1
...
[    7.139122] ata2.00: configured for UDMA/133
[    7.143767] ata2: EH complete
[    7.149571] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    7.188731] ata1.00: configured for UDMA/133
[    7.193347] ata1: EH complete
[    7.198255] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    7.454191] FAT-fs (mmcblk0p6): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[    7.481085] FAT-fs (mmcblk0p6): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    8.111178] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl
[    8.123701] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl
...


Bad start
Code:
...
[    2.603078] ahci 0000:01:00.0: version 3.0
[    2.603088] ahci 0000:01:00.0: enabling device (0000 -> 0002)
[    2.604895] ahci 0000:01:00.0: SSS flag set, parallel bus scan disabled
[    2.606642] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    2.608467] ahci 0000:01:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs
[    2.611101] scsi host0: ahci
[    2.612919] scsi host1: ahci
[    2.614743] ata1: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010100 irq 239
[    2.616930] ata2: SATA max UDMA/133 abar m512@0xfa010000 port 0xfa010180 irq 239
[    2.875669] EXT4-fs (mmcblk0p7): mounted filesystem with writeback data mode. Opts: (null)
...
[    4.741549] systemd[1]: Started Journal Service.
[    4.786973] systemd-journald[337]: Received request to flush runtime journal from PID 1
[    4.822384] ata1: SATA link down (SStatus 1 SControl 300)
[    4.826507] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen t4
[    4.830837] ata1: irq_stat 0x00000040, connection status changed
[    4.834850] ata1: SError: { CommWake DevExch }
[    4.838688] ata1: hard resetting link
...
[    5.317967] FAT-fs (mmcblk0p6): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[    5.338342] FAT-fs (mmcblk0p6): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[    7.048589] ata1: SATA link down (SStatus 1 SControl 300)
[    7.062679] ata1: EH complete
[    7.076432] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[    7.083094] ata1: irq_stat 0x00000040, connection status changed
[    7.089241] ata1: SError: { CommWake DevExch }
[    7.095259] ata1: limiting SATA link speed to 1.5 Gbps
[    7.101366] ata1: hard resetting link
[    9.311599] ata1: SATA link down (SStatus 1 SControl 310)
[    9.325961] ata1: EH complete
[    9.339887] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
[    9.346637] ata1: irq_stat 0x00000040, connection status changed
[    9.352832] ata1: SError: { CommWake DevExch }
[    9.358896] ata1: limiting SATA link speed to 1.5 Gbps
[    9.365048] ata1: hard resetting link
...
repeat in a loop

Thank you Nikolay for extracting the above information from the logs.

Good start was after ssh to console after bad start and executing reboot command so no changes to cables or anything inside.

I just tried to connect hard drives to atx power supply, turn on power supply and wait for hard drives to spinup and plugin dc power supply into socket which turned on rockpro64 and same problem, bad start. I tried the same thing an hour ago but I couldn't reproduce so it has to be time related as sometimes i need to leave it off for 5-10 minutes and sometimes for an hour or overnight.


Do you think i just need to find same size dc jack and connect to atx power supply to be 100% that it is not power issue?
#15
What is the difference? This is what I see:

1. There is different time of ATA initialization. Good is at 2.91 seconds, bad is at 2.61 seconds.

2. The file system mounting in bad case is performed at 2.87s without of additional log messages, none changes from defaults. In bad case the RAID is not detected, the data access is failing.

3. Good case file system mounting is preceded by RAID detection, then ATA information is printed and only then the file system is mounted at 3.51s.

4. May be a key difference: the SATA is trained at 3.0Gbps speed, not at 6Gbps as it may be default for 6Gbps controller and 6Gbps HDDs. The SStatus=123 means "Interface in active state", "Generation 2 communication rate negotiated" and "Device presence detected and Phy communication established".

5. Bad case is falling in a loop with hard resetting and complete link down. SStatus=1 means that interface is in active state, but no device detected and communication is not present.

6. After first link down, see 5.317967, the warning about FAT-FS was issued. The same warning as at good start. That means the file system trying to operate between the interface failures.

7. First failure interface control status is SControl=300 which is meaning no speed negotiation restrictions. This is obvious difference between bad start (no restrictions, 6Gbps) and good start (with a restriction of 3Gbps). Then, next failures are happening with SControl=310, which is meaning to limit speed negotiation to Generation 1 (i.e. SATA I, 1.5Gbps). But the failures repeating

8. Four milliseconds after each interface failure the CommWake signal is detected by a SATA controller. This means the HDD is trying to get online. And the gap between the failures is about 2.1s. It seems to me the file system is trying to operate between the failures.

It is obvious that the SATA controller and HDD both are supporting 6Gpbs Generation 3 speed. But in together they are working normally at 3Gbps.
In is unclear why the failures of bad start are repeating even at lowest interface speed, 1.5Gbps.
I think there is some kind of hardware problem. The interface failures between the SATA controller chip and the HDDs are definitely hardware. May be power supply problem, may be signal integrity (bad cable, noise).

Looking at "ROCKPro64 PCI-e to Dual SATA-II Interface Card" photo (have not received mine yet) I see ugly jumpers in... Taram! 6Gbps signal path! This is the same kind, may be even worse, of design error as SBC mounting holes without of metalization. It seems to me I found a deal for your soldering iron!
If you not planning to return the board (or can't by some reason), try to de-solder the jumper pins completely. You may heat the pad on controller PCB until the solder is melting, same time pulling the pin by a tweezers. You need to delete alien parts from very fast and gentle signal path. After removing the jumper pins, solder-in small U-shaped, short, solid wire braces instead of jumpers of needed direction. The solid wire jumpers should be short. The U-shape is needed just for convenience of soldering to keep the wire jumpers in PCB holes. Before inserting new jumpers you need to clear the holes from the solder by Wire Solder Remover Wick. Keep care to not desolder or destroy something else on board.

(05-31-2019, 03:32 PM)vecnar Wrote: I just tried to connect hard drives to atx power supply, turn on power supply and wait for hard drives to spinup and plugin dc power supply into socket which turned on rockpro64 and same problem, bad start.

So, this is not a power quantity problem. Your HDDs requiring 1.75A each peak. This is 3.5A at 12V peak. Any 5A-rated power supply should be mighty enough to start your drives. Make sure the HDD's cases have good contact to SBC ground. This is important because ATX power supply is different device and you may "catch" a ground loop with a plenty of high frequency noise between the power supplies. The absence of ground metalization at SBC holes plays bad here.

Quote:I tried the same thing an hour ago but I couldn't reproduce so it has to be time related as sometimes i need to leave it off for 5-10 minutes and sometimes for an hour or overnight.

I think this may be a hardware problem. When the devices are operating over their limits (6Gbps through 1/10" though hole mounted jumpers) any valid subtle software difference may produce a failure. At this moment I still think the problem is hardware. The SATA signal path (with jumpers - for sure) and, may be, power supply decoupling (a lack of good capacitors on controller board). The ground loops and their noise may play theirs role too.

Quote:Do you think i just need to find same size dc jack and connect to atx power supply to be 100% that it is not power issue?


No. First of all remove ugly (at 6Gpbs) jumpers.

BTW, armbian initializing the ata subsystem at 1.8 second. And initializing the drives at 6Gbps:

[ 3.288364] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 3.808325] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

And works then without of throwing error into the system log.

Also an interesting thing, look Serial ATA specification:

Page 77. SATA interface can tolerate with operation warranty only 100mV of common mode noise (a voltage difference between the controller and HDD). This is enough for good grounded system. But may be sensitive for ground noise in case of bad chassis grounding.
Page 88, paragraph "Matching". It says that both sides, controller and HDD, shall perform impedance matching. And I believe that a jumpers in SATA path may introduce too large impedance violation than certain integrated circuits can compensate. Usually impedance matching or time domain reflectometry (more complex than simple impedance matching) are tuned with a rough increment, not gradually. And sometimes the correction will allow to receive the signal, sometimes will not.

More verbose level of the logs is needed. May be some ATA code is tuning SATA controller somehow differently? I do not know, not at expert (yet).
Anyway with good signal path the SATA interface will work on default settings.
#16
(05-31-2019, 03:45 PM)Nikolay_Po Wrote: What is the difference? This is what I see:

1. There is different time of ATA initialization. Good is at 2.91 seconds, bad is at 2.61 seconds.

2. The file system mounting in bad case is performed at 2.87s without of additional log messages, none changes from defaults. In bad case the RAID is not detected, the data access is failing.

3. Good case file system mounting is preceded by RAID detection, then ATA information is printed and only then the file system is mounted at 3.51s.

4. May be a key difference: the SATA is trained at 3.0Gbps speed, not at 6Gbps as it may be default for 6Gbps controller and 6Gbps HDDs. The SStatus=123 means "Interface in active state", "Generation 2 communication rate negotiated" and "Device presence detected and Phy communication established".

5. Bad case is falling in a loop with hard resetting and complete link down. SStatus=1 means that interface is in active state, but no device detected and communication is not present.

6. After first link down, see 5.317967, the warning about FAT-FS was issued. The same warning as at good start. That means the file system trying to operate between the interface failures.

7. First failure interface control status is SControl=300 which is meaning no speed negotiation restrictions. This is obvious difference between bad start (no restrictions, 6Gbps) and good start (with a restriction of 3Gbps). Then, next failures are happening with SControl=310, which is meaning to limit speed negotiation to Generation 1 (i.e. SATA I, 1.5Gbps). But the failures repeating

8. Four milliseconds after each interface failure the CommWake signal is detected by a SATA controller. This means the HDD is trying to get online. And the gap between the failures is about 2.1s. It seems to me the file system is trying to operate between the failures.

It is obvious that the SATA controller and HDD both are supporting 6Gpbs Generation 3 speed. But in together they are working normally at 3Gbps.
In is unclear why the failures of bad start are repeating even at lowest interface speed, 1.5Gbps.
I think there is some kind of hardware problem. The interface failures between the SATA controller chip and the HDDs are definitely hardware. May be power supply problem, may be signal integrity (bad cable, noise).

Looking at "ROCKPro64 PCI-e to Dual SATA-II Interface Card" photo (have not received mine yet) I see ugly jumpers in... Taram! 6Gbps signal path! This is the same kind, may be even worse, of design error as SBC mounting holes without of metalization. It seems to me I found a deal for your soldering iron!
If you not planning to return the board (or can't by some reason), try to de-solder the jumper pins completely. You may heat the pad on controller PCB until the solder is melting, same time pulling the pin by a tweezers. You need to delete alien parts from very fast and gentle signal path. After removing the jumper pins, solder-in small U-shaped, short, solid wire braces instead of jumpers of needed direction. The solid wire jumpers should be short. The U-shape is needed just for convenience of soldering to keep the wire jumpers in PCB holes. Before inserting new jumpers you need to clear the holes from the solder by Wire Solder Remover Wick. Keep care to not desolder or destroy something else on board.

(05-31-2019, 03:32 PM)vecnar Wrote: I just tried to connect hard drives to atx power supply, turn on power supply and wait for hard drives to spinup and plugin dc power supply into socket which turned on rockpro64 and same problem, bad start.

So, this is not a power quantity problem. Your HDDs requiring 1.75A each peak. This is 3.5A at 12V peak. Any 5A-rated power supply should be mighty enough to start your drives. Make sure the HDD's cases have good contact to SBC ground. This is important because ATX power supply is different device and you may "catch" a ground loop with a plenty of high frequency noise between the power supplies. The absence of ground metalization at SBC holes plays bad here.

Quote:I tried the same thing an hour ago but I couldn't reproduce so it has to be time related as sometimes i need to leave it off for 5-10 minutes and sometimes for an hour or overnight.

I think this may be a hardware problem. When the devices are operating over their limits (6Gbps through 1/10" though hole mounted jumpers) any valid subtle software difference may produce a failure. At this moment I still think the problem is hardware. The SATA signal path (with jumpers - for sure) and, may be, power supply decoupling (a lack of good capacitors on controller board). The ground loops and their noise may play theirs role too.

Quote:Do you think i just need to find same size dc jack and connect to atx power supply to be 100% that it is not power issue?


No. First of all remove ugly (at 6Gpbs) jumpers.

BTW, armbian initializing the ata subsystem at 1.8 second. And initializing the drives at 6Gbps:

[    3.288364] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    3.808325] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

And works then without of throwing error into the system log.

Also an interesting thing, look Serial ATA specification:

Page 77. SATA interface can tolerate with operation warranty only 100mV of common mode noise (a voltage difference between the controller and HDD). This is enough for good grounded system. But may be sensitive for ground noise in case of bad chassis grounding.
Page 88, paragraph "Matching". It says that both sides, controller and HDD, shall perform impedance matching. And I believe that a jumpers in SATA path may introduce too large impedance violation than certain integrated circuits can compensate. Usually impedance matching or time domain reflectometry (more complex than simple impedance matching) are tuned with a rough increment, not gradually. And sometimes the correction will allow to receive the signal, sometimes will not.

More verbose level of the logs is needed. May be some ATA code is tuning SATA controller somehow differently? I do not know, not at expert (yet).
Anyway with good signal path the SATA interface will work on default settings.

Hello Nikolay,

Thanks a lot for such an in depth detective work! It is like an assignment in college and you should be getting some type of certificate for it.
I will try to solder and destroy the board as the pins are tiny and i am not good at soldering, just basic wires in car and motorbike in the past. But i would like to know if it is sata card for sure and if i fail this way i will have to get another card in any way as my soldering will not hold long. I will be looking for the cards without jumpers if that will be the problem to lookout for.
Multimeter when set at 200 on resistance scale shows 00.5 between 2 posts at the back of sata controller on pins joined with jumper but i am sure it is not accurate as 20cm cable has the same resistance.

Have a good weekend and i will update you probably on Sunday on my progress.
#17
Vecnar, you are like Doubting Thomas. How much and in what manner need I explain you that high speed aignals problems can't be resolved by a DC Ohmmeter?
Look at the complexity of the question in this presentation about high speed interconnect:

http://suddendocs.samtec.com/literature/...ndbook.pdf

And do not write anymore: "I done the check by a DC Ohmmeter. Signal path is OK for 6Gbps"! It looks inshainly fondly.
#18
(05-31-2019, 11:25 PM)Nikolay_Po Wrote: Vecnar, you are like Doubting Thomas. How much and in what manner need I explain you that high speed aignals problems can't be resolved by a DC Ohmmeter?
Look at the complexity of the question in this presentation about high speed interconnect:

http://suddendocs.samtec.com/literature/...ndbook.pdf

And do not write anymore: "I done the check by a DC Ohmmeter. Signal path is OK for 6Gbps"! It looks inshainly fondly.
Hello Nikolay,
I was just looking for a way to check if after soldering shorter wire without jumper would show me any difference. I was using multimeter for basic understanding of connectivity and looking for a confirmation of what I did made the difference without going into details, not using multimeter term to show my knowledge of any type.
I am not an electronics engineer and will never be, sorry if it insults you me mentioning multimeter.
Your last post really discourages me even trying to play with soldering now but I will try later in the day and get back with news.
#19
Okay, no problem. It is hard to a customer to do a developer's work.
The shorter wires are needed to eliminate the cross-talk between digital lines. Usual plastic-cased  jumpers are too large and have very high capacitance (edit: for such a high frequency signals as 6Gbps SATA protocol). So being placed in vicinity on signal pairs, the jumpers are distorting the signals. One signal, say, transmission form SATA adapter to HDD will induce a crosstalk with next jumper in a row, to the receive signal. And vice versa. There may be both distortion - a cross-talk between receive and transmission and the distortion of signal form (eye-diagram closing) due to the reflections from heterogeneity introduced by the jumpers.
These jumpers are most suspicious part.

Do not touch the SBC by a solder yet. Probably the jumper replacement helps.

BTW, you may ask to replace the jumpers the specialist of nearest phone repair shop. Usually they have good enough soldering skill.
#20
(06-01-2019, 03:17 AM)Nikolay_Po Wrote: Okay, no problem. It is hard to a customer to do a developer's work.
The shorter wires are needed to eliminate the cross-talk between digital lines. Usual plastic-cased  jumpers are too large and have very high capacitance (edit: for such a high frequency signals as 6Gbps SATA protocol). So being placed in vicinity on signal pairs, the jumpers are distorting the signals. One signal, say, transmission form SATA adapter to HDD will induce a crosstalk with next jumper in a row, to the receive signal. And vice versa. There may be both distortion - a cross-talk between receive and transmission and the distortion of signal form (eye-diagram closing) due to the reflections from heterogeneity introduced by the jumpers.
These jumpers are most suspicious part.

Do not touch the SBC by a solder yet. Probably the jumper replacement helps.

BTW, you may ask to replace the jumpers the specialist of nearest phone repair shop. Usually they have good enough soldering skill.

Thank you for the above post Nikolay.
Regarding soldering wire, I cannot find solid wire thin enough to fit, can I use braided copper wire or use existing posts from one side, bend it in u shape and use that?

I will try soldering myself and if not successful I will order better card as soldering charges in here will cost double than ordering a better card.


Possibly Related Threads…
Thread Author Replies Views Last Post
  Which SATA card should I use my NAS server RAID5 Louysa 3 1,537 09-24-2023, 04:40 AM
Last Post: JPT223
  SATA keeps crashing JPT223 1 1,013 09-21-2023, 10:52 PM
Last Post: tllim
  SATA hotplug not working? JPT223 0 767 09-15-2023, 04:20 AM
Last Post: JPT223
  Compatible PCIe Sata Controller spacebricker 1 1,962 02-06-2023, 10:03 AM
Last Post: diizzy
  ROCKPro64 with 16 ports SATA controller ZeblodS 19 29,139 12-18-2022, 06:25 PM
Last Post: heyghoge
  PCIe bifurcation support (on RK3399) Arn 1 1,774 11-28-2022, 05:12 PM
Last Post: tllim
  Existings disks and using a StarTech SATA Controller jkugler 1 2,438 12-09-2021, 06:41 AM
Last Post: SVDSHRDJD
  RockPro64 doesn't boot when PCIe to M.2 adapter is installed Cerberus 3 4,194 11-27-2021, 11:38 PM
Last Post: Cerberus
  Rockpro64 Sata Card kills itself jerry110 33 49,876 10-20-2021, 04:36 AM
Last Post: fieni
  Right direction SATA card corax 2 3,472 09-15-2021, 12:46 PM
Last Post: corax

Forum Jump:


Users browsing this thread: 1 Guest(s)