Dec 20 21:23:45 vgfs001 kernel: tiotest_AMD_x86 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0Dec 20 21:23:45 vgfs001 kernel: tiotest_AMD_x86 cpuset=/ mems_allowed=0Dec 20 21:23:45 vgfs001 kernel: Pid: 1937, comm: tiotest_AMD_x86 Not tainted 2.6.32-431.29.2.lustre.el6.x86_64 #1Dec 20 21:23:45 vgfs001 kernel: Call Trace:Dec 20 21:23:45 vgfs001 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0Dec 20 21:23:45 vgfs001 kernel: [ ] ? dump_header+0x90/0x1b0Dec 20 21:23:45 vgfs001 kernel: [ ] ? security_real_capable_noaudit+0x3c/0x70Dec 20 21:23:45 vgfs001 kernel: [ ] ? oom_kill_process+0x82/0x2a0Dec 20 21:23:45 vgfs001 kernel: [ ] ? select_bad_process+0xe1/0x120Dec 20 21:23:45 vgfs001 kernel: [ ] ? out_of_memory+0x220/0x3c0Dec 20 21:23:45 vgfs001 kernel: [ ] ? __alloc_pages_nodemask+0x89f/0x8d0Dec 20 21:23:45 vgfs001 kernel: [ ] ? alloc_pages_current+0xaa/0x110Dec 20 21:23:45 vgfs001 kernel: [ ] ? __page_cache_alloc+0x87/0x90Dec 20 21:23:45 vgfs001 kernel: [ ] ? grab_cache_page_write_begin+0x8e/0xc0Dec 20 21:23:45 vgfs001 kernel: [ ] ? ll_write_begin+0x58/0x1a0 [lustre]Dec 20 21:23:45 vgfs001 kernel: [ ] ? generic_file_buffered_write+0x123/0x2e0Dec 20 21:23:45 vgfs001 kernel: [ ] ? current_fs_time+0x27/0x30Dec 20 21:23:45 vgfs001 kernel: [ ] ? __generic_file_aio_write+0x260/0x490Dec 20 21:23:45 vgfs001 kernel: [ ] ? cl_env_info+0x15/0x20 [obdclass]Dec 20 21:23:45 vgfs001 kernel: [ ] ? generic_file_aio_write+0x88/0x100Dec 20 21:23:45 vgfs001 kernel: [ ] ? vvp_io_write_start+0x137/0x2a0 [lustre]Dec 20 21:23:45 vgfs001 kernel: [ ] ? cl_io_start+0x6a/0x140 [obdclass]Dec 20 21:23:45 vgfs001 kernel: [ ] ? cl_io_loop+0xb4/0x1b0 [obdclass]Dec 20 21:23:45 vgfs001 kernel: [ ] ? ll_file_io_generic+0x2a6/0x610 [lustre]Dec 20 21:23:45 vgfs001 kernel: [ ] ? ll_file_aio_write+0x142/0x2c0 [lustre]Dec 20 21:23:45 vgfs001 kernel: [ ] ? ll_file_write+0x16c/0x2a0 [lustre]Dec 20 21:23:45 vgfs001 kernel: [ ] ? vfs_write+0xb8/0x1a0Dec 20 21:23:45 vgfs001 kernel: [ ] ? sys_write+0x51/0x90Dec 20 21:23:45 vgfs001 kernel: [ ] ? __audit_syscall_exit+0x25e/0x290Dec 20 21:23:45 vgfs001 kernel: [ ] ? system_call_fastpath+0x16/0x1bDec 20 21:23:45 vgfs001 kernel: Mem-Info:Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA per-cpu:Dec 20 21:23:45 vgfs001 kernel: CPU 0: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 1: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 2: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 3: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 4: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 5: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 6: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 7: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 8: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 9: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 10: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 11: hi: 0, btch: 1 usd: 0Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA32 per-cpu:Dec 20 21:23:45 vgfs001 kernel: CPU 0: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 1: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 2: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 3: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 4: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 5: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 6: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 7: hi: 186, btch: 31 usd: 11Dec 20 21:23:45 vgfs001 kernel: CPU 8: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 9: hi: 186, btch: 31 usd: 46Dec 20 21:23:45 vgfs001 kernel: CPU 10: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 11: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: Node 0 Normal per-cpu:Dec 20 21:23:45 vgfs001 kernel: CPU 0: hi: 186, btch: 31 usd: 2Dec 20 21:23:45 vgfs001 kernel: CPU 1: hi: 186, btch: 31 usd: 7Dec 20 21:23:45 vgfs001 kernel: CPU 2: hi: 186, btch: 31 usd: 27Dec 20 21:23:45 vgfs001 kernel: CPU 3: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 4: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 5: hi: 186, btch: 31 usd: 39Dec 20 21:23:45 vgfs001 kernel: CPU 6: hi: 186, btch: 31 usd: 33Dec 20 21:23:45 vgfs001 kernel: CPU 7: hi: 186, btch: 31 usd: 0Dec 20 21:23:45 vgfs001 kernel: CPU 8: hi: 186, btch: 31 usd: 1Dec 20 21:23:45 vgfs001 kernel: CPU 9: hi: 186, btch: 31 usd: 35Dec 20 21:23:45 vgfs001 kernel: CPU 10: hi: 186, btch: 31 usd: 29Dec 20 21:23:45 vgfs001 kernel: CPU 11: hi: 186, btch: 31 usd: 2Dec 20 21:23:45 vgfs001 kernel: active_anon:1198006 inactive_anon:171400 isolated_anon:96Dec 20 21:23:45 vgfs001 kernel: active_file:548228 inactive_file:548497 isolated_file:0Dec 20 21:23:45 vgfs001 kernel: unevictable:0 dirty:899 writeback:2342 unstable:0Dec 20 21:23:45 vgfs001 kernel: free:29297 slab_reclaimable:10639 slab_unreclaimable:376601Dec 20 21:23:45 vgfs001 kernel: mapped:1032 shmem:0 pagetables:5613 bounce:0Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA free:15708kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15320kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yesDec 20 21:23:45 vgfs001 kernel: lowmem_reserve[]: 0 3512 12097 12097Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA32 free:53892kB min:19596kB low:24492kB high:29392kB active_anon:4kB inactive_anon:44kB active_file:1249260kB inactive_file:1249288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596496kB mlocked:0kB dirty:3436kB writeback:4180kB mapped:0kB shmem:0kB slab_reclaimable:24608kB slab_unreclaimable:689432kB kernel_stack:8kB pagetables:196kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:4212142 all_unreclaimable? noDec 20 21:23:45 vgfs001 kernel: lowmem_reserve[]: 0 0 8585 8585Dec 20 21:23:45 vgfs001 kernel: Node 0 Normal free:47588kB min:47900kB low:59872kB high:71848kB active_anon:4792020kB inactive_anon:685556kB active_file:943652kB inactive_file:944700kB unevictable:0kB isolated(anon):384kB isolated(file):0kB present:8791040kB mlocked:0kB dirty:160kB writeback:5188kB mapped:4128kB shmem:0kB slab_reclaimable:17948kB slab_unreclaimable:816972kB kernel_stack:5040kB pagetables:22256kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2346101 all_unreclaimable? noDec 20 21:23:45 vgfs001 kernel: lowmem_reserve[]: 0 0 0 0Dec 20 21:23:45 vgfs001 kernel: Node 0 DMA: 3*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15708kBDec 20 21:23:45 vgfs001 kernel: Node 0 DMA32: 183*4kB 19*8kB 19*16kB 19*32kB 24*64kB 17*128kB 7*256kB 5*512kB 27*1024kB 8*2048kB 0*4096kB = 53892kBDec 20 21:23:45 vgfs001 kernel: Node 0 Normal: 109*4kB 185*8kB 121*16kB 43*32kB 8*64kB 117*128kB 43*256kB 8*512kB 1*1024kB 1*2048kB 2*4096kB = 47084kBDec 20 21:23:45 vgfs001 kernel: 1269461 total pagecache pagesDec 20 21:23:45 vgfs001 kernel: 172616 pages in swap cacheDec 20 21:23:45 vgfs001 kernel: Swap cache stats: add 1017139, delete 844523, find 444300/457367Dec 20 21:23:45 vgfs001 kernel: Free swap = 3377416kBDec 20 21:23:45 vgfs001 kernel: Total swap = 4194300kBDec 20 21:23:45 vgfs001 kernel: 3145727 pages RAMDec 20 21:23:45 vgfs001 kernel: 96633 pages reservedDec 20 21:23:45 vgfs001 kernel: 9844603 pages sharedDec 20 21:23:45 vgfs001 kernel: 528776 pages non-sharedDec 20 21:23:45 vgfs001 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj nameDec 20 21:23:45 vgfs001 kernel: [ 591] 0 591 2817 4 9 -17 -1000 udevdDec 20 21:23:45 vgfs001 kernel: [ 2028] 0 2028 6899 30 0 -17 -1000 auditdDec 20 21:23:45 vgfs001 kernel: [ 2058] 0 2058 63875 54 2 0 0 rsyslogdDec 20 21:23:45 vgfs001 kernel: [ 2088] 0 2088 2740 38 7 0 0 irqbalanceDec 20 21:23:45 vgfs001 kernel: [ 2110] 32 2110 4744 22 1 0 0 rpcbindDec 20 21:23:45 vgfs001 kernel: [ 2229] 81 2229 8028 9 3 0 0 dbus-daemonDec 20 21:23:45 vgfs001 kernel: [ 2251] 29 2251 5837 10 2 0 0 rpc.statdDec 20 21:23:45 vgfs001 kernel: [ 2281] 0 2281 47351 11 7 0 0 cupsdDec 20 21:23:45 vgfs001 kernel: [ 2317] 0 2317 1020 8 0 0 0 acpidDec 20 21:23:45 vgfs001 kernel: [ 2327] 68 2327 9771 123 9 0 0 haldDec 20 21:23:45 vgfs001 kernel: [ 2328] 0 2328 5100 9 10 0 0 hald-runnerDec 20 21:23:45 vgfs001 kernel: [ 2370] 0 2370 5630 8 7 0 0 hald-addon-inpuDec 20 21:23:45 vgfs001 kernel: [ 2376] 68 2376 4502 9 0 0 0 hald-addon-acpiDec 20 21:23:45 vgfs001 kernel: [ 2396] 0 2396 96535 42 11 0 0 automountDec 20 21:23:45 vgfs001 kernel: [ 2425] 0 2425 16671 8 4 -17 -1000 sshdDec 20 21:23:45 vgfs001 kernel: [ 2534] 0 2534 20331 28 4 0 0 masterDec 20 21:23:45 vgfs001 kernel: [ 2549] 89 2549 20397 29 10 0 0 qmgrDec 20 21:23:45 vgfs001 kernel: [ 2562] 0 2562 28661 7 1 0 0 abrtdDec 20 21:23:45 vgfs001 kernel: [ 2577] 0 2577 27116 77 6 0 0 ksmtunedDec 20 21:23:45 vgfs001 kernel: [ 2589] 0 2589 29332 21 6 0 0 crondDec 20 21:23:45 vgfs001 kernel: [ 2638] 0 2638 5394 5 4 0 0 atdDec 20 21:23:45 vgfs001 kernel: [ 2649] 0 2649 104692 1712 3 0 0 pythonDec 20 21:23:45 vgfs001 kernel: [ 2666] 0 2666 257137 979 3 0 0 libvirtdDec 20 21:23:45 vgfs001 kernel: [ 2695] 0 2695 27085 6 5 0 0 rhsmcertdDec 20 21:23:45 vgfs001 kernel: [ 2796] 99 2796 3223 9 7 0 0 dnsmasqDec 20 21:23:45 vgfs001 kernel: [ 2802] 0 2802 16175 7 1 0 0 certmongerDec 20 21:23:45 vgfs001 kernel: [ 2824] 0 2824 33502 11 1 0 0 gdm-binaryDec 20 21:23:45 vgfs001 kernel: [ 2840] 0 2840 1016 6 3 0 0 mingettyDec 20 21:23:45 vgfs001 kernel: [ 2842] 0 2842 1016 6 7 0 0 mingettyDec 20 21:23:45 vgfs001 kernel: [ 2844] 0 2844 1016 6 4 0 0 mingettyDec 20 21:23:45 vgfs001 kernel: [ 2846] 0 2846 1016 6 4 0 0 mingettyDec 20 21:23:45 vgfs001 kernel: [ 2850] 0 2850 1016 6 4 0 0 mingettyDec 20 21:23:45 vgfs001 kernel: [ 2862] 0 2862 3212 4 9 -17 -1000 udevdDec 20 21:23:45 vgfs001 kernel: [ 2863] 0 2863 3212 4 9 -17 -1000 udevdDec 20 21:23:45 vgfs001 kernel: [ 2911] 0 2911 41157 11 6 0 0 gdm-simple-slavDec 20 21:23:45 vgfs001 kernel: [ 2929] 0 2929 35211 911 2 0 0 XorgDec 20 21:23:45 vgfs001 kernel: [ 2970] 0 2970 1029163 10 1 0 0 console-kit-daeDec 20 21:23:45 vgfs001 kernel: [ 3040] 42 3040 5010 5 9 0 0 dbus-launchDec 20 21:23:45 vgfs001 kernel: [ 3041] 42 3041 7951 10 0 0 0 dbus-daemonDec 20 21:23:45 vgfs001 kernel: [ 3043] 42 3043 67404 11 8 0 0 gnome-sessionDec 20 21:23:45 vgfs001 kernel: [ 3046] 0 3046 12497 11 3 0 0 devkit-power-daDec 20 21:23:45 vgfs001 kernel: [ 3052] 42 3052 33326 64 0 0 0 gconfd-2Dec 20 21:23:45 vgfs001 kernel: [ 3069] 42 3069 91526 3293 8 0 0 gnome-settings-Dec 20 21:23:45 vgfs001 kernel: [ 3070] 42 3070 30178 56 0 0 0 at-spi-registryDec 20 21:23:45 vgfs001 kernel: [ 3072] 42 3072 89614 11 6 0 0 bonobo-activatiDec 20 21:23:45 vgfs001 kernel: [ 3080] 42 3080 33821 11 8 0 0 gvfsdDec 20 21:23:45 vgfs001 kernel: [ 3081] 42 3081 72400 92 0 0 0 metacityDec 20 21:23:45 vgfs001 kernel: [ 3084] 42 3084 68544 64 2 0 0 gnome-power-manDec 20 21:23:45 vgfs001 kernel: [ 3085] 42 3085 62195 10 6 0 0 polkit-gnome-auDec 20 21:23:45 vgfs001 kernel: [ 3087] 42 3087 96302 288 0 0 0 gdm-simple-greeDec 20 21:23:45 vgfs001 kernel: [ 3094] 0 3094 13186 10 9 0 0 polkitdDec 20 21:23:45 vgfs001 kernel: [ 3107] 42 3107 86550 9 5 0 0 pulseaudioDec 20 21:23:45 vgfs001 kernel: [ 3109] 499 3109 42114 25 10 0 0 rtkit-daemonDec 20 21:23:45 vgfs001 kernel: [ 3114] 0 3114 35562 11 6 0 0 gdm-session-worDec 20 21:23:45 vgfs001 kernel: [27425] 0 27425 25109 40 3 0 0 sshdDec 20 21:23:45 vgfs001 kernel: [27430] 0 27430 27123 80 6 0 0 bashDec 20 21:23:45 vgfs001 kernel: [ 1567] 0 1567 1711609 1190642 1 0 0 lwfsdDec 20 21:23:45 vgfs001 kernel: [ 1691] 89 1691 20351 20 5 0 0 pickupDec 20 21:23:45 vgfs001 kernel: [ 1926] 0 1926 25227 25 8 0 0 sleepDec 20 21:23:45 vgfs001 kernel: [ 1927] 0 1927 46749 4269 7 0 0 tiotest_AMD_x86Dec 20 21:23:45 vgfs001 kernel: Out of memory: Kill process 1567 (lwfsd) score 306 or sacrifice childDec 20 21:23:45 vgfs001 kernel: Killed process 1567, UID 0, (lwfsd) total-vm:6846436kB, anon-rss:4742528kB, file-rss:20040kB
这里是从Lustre的入口导致的oom,但实际上,其他入口例如KVM管理程序也可能引起oom,即任何分配内存的可能点都可能引起oom。
从分析过程来看,确实是Lustre的Cache占用了大量内存,导致内存分配不足。
三个措施。
1、增大内存从12GB增大到16GB。virsh setmaxmem vgfsxxx 16GB --config运行启动后virsh setmem vgfsxxx 16GB这个没有用,跑了几次测试后,仍然掉服务。2、调整lwfsd的服务优先级
设置lwfsd的服务优先级为“-17”PID=`ps | grep lwfs | grep -v grep | awk '{print $1}'`echo -17 > /proc/$PID/oom_adjecho -17 > /proc/$PID/task/$PID/oom_adj这个好像有用。3、修改内存分配策略
并且echo "2" >/proc/sys/vm/overcommit_memory,使得分配内存时,必须存在足够的空间用于映射。这个好像也有一定的用处。再跑跑试试。