Debian memory fragment - ice00 - 05-25-2016
hi,
I'm using Debian with kernel tree 101 (not 65) and have the 2GB board.
The server is used with pools coin mining software (already used onto Raspberry Pi2 without problem with Raspian).
I get kernel messages of out of memory due to fragmentation (some python script of the pool use lot of contiguos memory).
Code: May 25 20:16:27 localhost kernel: [187398.049922] python2.7: page allocation failure: order:1, mode:0x200020
May 25 20:16:27 localhost kernel: [187398.057186] CPU: 1 PID: 5604 Comm: python2.7 Tainted: G O 3.10.101-0-pine64-l
ongsleep #39
May 25 20:16:27 localhost kernel: [187398.069955] [<ffffffc000088388>] dump_backtrace+0x0/0x118
May 25 20:16:27 localhost kernel: [187398.075965] [<ffffffc0000884c4>] show_stack+0x24/0x30
May 25 20:16:27 localhost kernel: [187398.081592] [<ffffffc0006be948>] dump_stack+0x20/0x28
May 25 20:16:27 localhost kernel: [187398.087213] [<ffffffc00015fb58>] warn_alloc_failed+0x118/0x130
May 25 20:16:27 localhost kernel: [187398.093682] [<ffffffc000162d38>] __alloc_pages_nodemask+0x6b8/0x834
May 25 20:16:27 localhost kernel: [187398.100629] [<ffffffc00019deb4>] cache_alloc_refill+0x370/0x6ac
May 25 20:16:27 localhost kernel: [187398.107193] [<ffffffc00019e4a0>] kmem_cache_alloc+0x9c/0x200
May 25 20:16:27 localhost kernel: [187398.113472] [<ffffffc0005bc584>] sk_prot_alloc+0x3c/0x10c
May 25 20:16:27 localhost kernel: [187398.119465] [<ffffffc0005be94c>] sk_clone_lock+0x34/0x290
May 25 20:16:27 localhost kernel: [187398.125460] [<ffffffc000607e58>] inet_csk_clone_lock+0x30/0xb0
May 25 20:16:27 localhost kernel: [187398.131930] [<ffffffc00061e588>] tcp_create_openreq_child+0x38/0x34c
May 25 20:16:27 localhost kernel: [187398.138970] [<ffffffc00061b64c>] tcp_v4_syn_recv_sock+0x4c/0x234
May 25 20:16:27 localhost kernel: [187398.145628] [<ffffffc00061eca8>] tcp_check_req+0x304/0x4c0
May 25 20:16:27 localhost kernel: [187398.151717] [<ffffffc00061b9ac>] tcp_v4_do_rcv+0x178/0x330
May 25 20:16:27 localhost kernel: [187398.157803] [<ffffffc00061de18>] tcp_v4_rcv+0x4c0/0x6ec
:
Message from syslogd@localhost at May 25 20:26:04 ...
kernel:[187965.808278] Call trace:
May 25 20:16:27 localhost kernel: [187398.163607] [<ffffffc0005fcbb8>] ip_local_deliver_finish+0x13c/0x240
May 25 20:16:27 localhost kernel: [187398.170642] [<ffffffc0005fd140>] ip_local_deliver+0x88/0x9c
May 25 20:16:27 localhost kernel: [187398.176826] [<ffffffc0005fcfa0>] ip_rcv_finish+0x2e4/0x2f4
May 25 20:16:27 localhost kernel: [187398.182914] [<ffffffc0005fd3ec>] ip_rcv+0x298/0x2dc
May 25 20:16:27 localhost kernel: [187398.188338] [<ffffffc0005cf82c>] __netif_receive_skb_core+0x650/0x6d0
May 25 20:16:27 localhost kernel: [187398.195472] [<ffffffc0005cf8d4>] __netif_receive_skb+0x28/0x84
May 25 20:16:27 localhost kernel: [187398.201942] [<ffffffc0005d089c>] process_backlog+0xbc/0x194
May 25 20:16:27 localhost kernel: [187398.208124] [<ffffffc0005d06c4>] net_rx_action+0xe0/0x1fc
May 25 20:16:27 localhost kernel: [187398.214122] [<ffffffc0000b5ff8>] __do_softirq+0x160/0x2d8
May 25 20:16:27 localhost kernel: [187398.220118] [<ffffffc0000b625c>] do_softirq+0x54/0x64
May 25 20:16:27 localhost kernel: [187398.225733] [<ffffffc0000b62fc>] local_bh_enable_ip+0x90/0xbc
May 25 20:16:27 localhost kernel: [187398.232105] [<ffffffc0006c3890>] _raw_spin_unlock_bh+0x44/0x50
May 25 20:16:27 localhost kernel: [187398.238579] [<ffffffc0005bdc78>] release_sock+0x104/0x130
May 25 20:16:27 localhost kernel: [187398.244576] [<ffffffc00062e478>] inet_stream_connect+0x54/0x68
May 25 20:16:27 localhost kernel: [187398.251043] [<ffffffc0005bab78>] SyS_connect+0x8c/0xb4
May 25 20:16:27 localhost kernel: [187398.256748] Mem-Info:
May 25 20:16:27 localhost kernel: [187398.259321] DMA per-cpu:
May 25 20:16:27 localhost kernel: [187398.262178] CPU 0: hi: 186, btch: 31 usd: 28
May 25 20:16:27 localhost kernel: [187398.267506] CPU 1: hi: 186, btch: 31 usd: 0
May 25 20:16:27 localhost kernel: [187398.272835] CPU 2: hi: 186, btch: 31 usd: 0
May 25 20:16:27 localhost kernel: [187398.278161] CPU 3: hi: 186, btch: 31 usd: 62
May 25 20:16:27 localhost kernel: [187398.283495] active_anon:202579 inactive_anon:112366 isolated_anon:23
May 25 20:16:27 localhost kernel: [187398.283495] active_file:76997 inactive_file:88954 isolated_file:0
May 25 20:16:27 localhost kernel: [187398.283495] unevictable:27 dirty:265 writeback:0 unstable:0
May 25 20:16:27 localhost kernel: [187398.283495] free:6228 slab_reclaimable:5357 slab_unreclaimable:4630
May 25 20:16:27 localhost kernel: [187398.283495] mapped:12866 shmem:19578 pagetables:1549 bounce:0
May 25 20:16:27 localhost kernel: [187398.283495] free_cma:721
May 25 20:16:27 localhost kernel: [187398.320086] DMA free:23920kB min:5612kB low:7012kB high:8416kB active_anon:810316kB inactive_anon:449464kB active_file:307988kB inactive_file:356808kB unevictable:108kB isolated(anon):92kB isolated(file):0kB present:2080768kB managed:1969892kB mlocked:108kB dirty:1060kB writeback:0kB mapped:51464kB shmem:78312kB slab_reclaimable:21428kB slab_unreclaimable:18520kB kernel_stack:3472kB pagetables:6196kB unstable:0kB bounce:0kB free_cma:2884kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
May 25 20:16:27 localhost kernel: [187398.368043] lowmem_reserve[]: 0 0 0
May 25 20:16:27 localhost kernel: [187398.371956] DMA: 4021*4kB (UEMC) 395*8kB (UEMC) 52*16kB (UMRC) 28*32kB (MR) 10*64kB (MR) 3*128kB (M) 1*256kB (M) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 23788kB
May 25 20:16:27 localhost kernel: [187398.388272] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
May 25 20:16:27 localhost kernel: [187398.397584] 196748 total pagecache pages
May 25 20:16:27 localhost kernel: [187398.401963] 10852 pages in swap cache
May 25 20:16:27 localhost kernel: [187398.406055] Swap cache stats: add 302607, delete 291755, find 506797/524929
May 25 20:16:27 localhost kernel: [187398.413750] Free swap = 328416kB
May 25 20:16:27 localhost kernel: [187398.417459] Total swap = 524284kB
May 25 20:16:27 localhost kernel: [187398.421172] SLAB: Unable to allocate memory on node 0 (gfp=0x20)
May 25 20:16:27 localhost kernel: [187398.427825] cache: TCP, object size: 1728, order: 1
May 25 20:16:27 localhost kernel: [187398.433438] node 0: slabs: 11/11, objs: 44/44, free: 0
here te actual memory state (I have even 512MB swap area)
Code: top - 20:30:14 up 2 days, 5:07, 1 user, load average: 2.14, 2.36, 2.47
Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.2 us, 5.6 sy, 30.0 ni, 31.6 id, 20.6 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 2036960 total, 2013580 used, 23380 free, 2652 buffers
KiB Swap: 524284 total, 195868 used, 328416 free. 727200 cached Mem
here the fragmentation:
Code: root@pine64pro:# cat /proc/buddyinfo
Node 0, zone DMA 1420 162 62 11 5 1 0 1 1 0 0
Any Idea about something to do to prevent those errors?
Thanks
RE: Debian memory fragment - longsleep - 05-25-2016
Mhm i do not see why this error would be specific to the Pine64. It might be related to aarch64 though. Could you try to pretty print the output of /proc/buddyinfo (eg. with a script like the one in http://andorian.blogspot.de/2014/03/making-sense-of-procbuddyinfo.html). Especially compare the output directly after boot and when your software is running and how much it tries to allocate.
RE: Debian memory fragment - ice00 - 05-26-2016
Hi,
thanks, I will test after I can make a reboot (at least in the weekend).
I did not know if it is related to aarch64, but in Raspberry Pi2 with 32bit and 1GB or RAM (with a swap of 256Mb if I rememeber good) I never get such message in 6 months of 24/24 server running.
The unique difference in swap memory is that Raspian installation probably had create a swap dedicated partition, while in Pine64 I have a 512MB file used as swap area created by me.
RE: Debian memory fragment - ice00 - 05-28-2016
1 minute after the boot:
Code: root@pine64pro:/home/debian# python /root/fragment.py
2016-05-28 14:27:55,939 - main - INFO - Starting....
2016-05-28 14:27:55,939 - main - INFO - Parsed options: {'size': None}
<logging.Logger object at 0x7fa7a844d0>
2016-05-28 14:27:55,940 - main - DEBUG - Parsing line: Node 0, zone DMA 52 13 9 6 3 1 2 5 5 3 101
2016-05-28 14:27:55,943 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'DMA', 'nr_free': '52 13 9 6 3 1 2 5 5 3 101'}
2016-05-28 14:27:55,944 - main - INFO - defaultdict(<type 'list'>, {0: [{'usage': [212992, 106496, 147456, 196608, 196608, 131072, 524288, 2621440, 5242880, 6291456, 423624704], 'nr_free': [52, 13, 9, 6, 3, 1, 2, 5, 5, 3, 101], 'zone': 'DMA', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}]})
Node: 0
Zone: DMA
Free KiB in zone: 429000.00
Fragment size Free fragments Total available KiB
4096 52 208.0
8192 13 104.0
16384 9 144.0
32768 6 192.0
65536 3 192.0
131072 1 128.0
262144 2 512.0
524288 5 2560.0
1048576 5 5120.0
2097152 3 6144.0
4194304 101 413696.0
and now some hours later after I activate zram compression and port swapness from standard 60 to 1.
Code: oot@pine64pro:/home/debian# python /root/fragment.py
2016-05-28 16:55:37,433 - main - INFO - Starting....
2016-05-28 16:55:37,433 - main - INFO - Parsed options: {'size': None}
<logging.Logger object at 0x7fb41324d0>
2016-05-28 16:55:37,436 - main - DEBUG - Parsing line: Node 0, zone DMA 888 1094 404 169 57 22 13 3 0 1 1
2016-05-28 16:55:37,439 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'DMA', 'nr_free': '888 1094 404 169 57 22 13 3 0 1 1'}
2016-05-28 16:55:37,440 - main - INFO - defaultdict(<type 'list'>, {0: [{'usage': [3637248, 8962048, 6619136, 5537792, 3735552, 2883584, 3407872, 1572864, 0, 2097152, 4194304], 'nr_free': [888, 1094, 404, 169, 57, 22, 13, 3, 0, 1, 1], 'zone': 'DMA', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}]})
Node: 0
Zone: DMA
Free KiB in zone: 41648.00
Fragment size Free fragments Total available KiB
4096 888 3552.0
8192 1094 8752.0
16384 404 6464.0
32768 169 5408.0
65536 57 3648.0
131072 22 2816.0
262144 13 3328.0
524288 3 1536.0
1048576 0 0.0
2097152 1 2048.0
4194304 1 4096.0
never get a problem until now.
Thanks
|