1
2MCE test suite HOWTO
3====================
4
511 November 2008
6
7Huang Ying
8
9Section 4.2 (Test with kdump test driver) is based on the README of
10LTP kdump test case.
11
12Abstract
13--------
14
15This document explains the structure and design of MCE test suite, the
16kernel patch and user space tools needed for automatic tests, usage
17guide and how to add new test cases into test suite.
18
190. Quick shortcut
20------------------
21
22- Install the Linux kernel with full MCE injection support, including
23  latest Linux kernel (2.6.31) and MCE injection enhancement patchset
24  in: http://ftp.kernel.org/pub/linux/kernel/people/yhuang/mce/. Make
25  sure following configuration options are enabled:
26
27  CONFIG_X86_MCE=y
28  CONFIG_X86_MCE_INTEL=y
29  CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m
30
31- Get mcelog git version from 
32  git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git.
33  and install in /usr/sbin (or rather first in your $PATH)
34
35  git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git
36  cd mcelog
37  make
38  sudo make install
39
40- Get mce-inject git version from
41  git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git.
42
43  git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
44  cd mce-inject
45  make
46  sudo make install
47
48- Install page-types tool (sec 3.4), which is accompanied with Linux kernel
49  source (2.6.32 or newer).
50
51  cd $KERNEL_SRC/Documentation/vm/
52  gcc -o page-types page-types.c
53  cp page-types /usr/bin/
54
55- Run make test
56  This will do the basic tests, but not the more complicated kdump ones.
57  For more information on those read below.
58
591. Introduction
60---------------
61
62The MCE test suite is a collection of tools and test scripts for
63testing the Linux kernel MCE processing features. The goal is to cover
64most Linux kernel MCE processing code paths and features with
65automation tests.
66
67If you just want to start testing as quickly as possible, you can skip
68section 2 and section 3, and go section 4.1 directly.
69
70
712. Structure
72------------
73
74The main intention behind the design is to re-use test cases amongst
75various test methods (represented as test drivers), such as kdump
76based, kernel MCE panic log (tolerant=3) based, etc.
77
782.1 Test cases
79
80Test cases are grouped into test case classes. The test cases in one
81class share the similar triggering, result collecting and result
82verifying methods. They can be used in same set of test drivers. The
83interface of a test case class is a shell script, usually named as
84cases.sh under a sub-directory of cases/. The following command line
85option should be supported by the test case class shell script:
86
87cases.sh enumerate		enumerate test cases in class, print test
88				case names to stdout
89cases.sh trigger		trigger the test case
90cases.sh get_result		get the result of test case
91cases.sh verify			verify the result of test case, and print
92				the verify result to stdout
93
94When execute cases.sh [trigger|get_result|verify], the test case is
95specified via environment variable this_case, which must be one of the
96test case names returned by "cases.sh enumerate".
97
98Other environment variables are also used to pass some information
99from driver to test cases, such as:
100
101this_case			name current test case
102driver				name of test driver
103klog				file name which holds kernel log during test
104KSRC_DIR (for gcov)		kernel source code directory
105GCOV (for gcov)			gcov collection method
106vmcore (for kdump)		vmcore file name
107reboot (for kdump)		indicate there is a reboot between test
108				case trigger and test case verify, some
109				context has been gone.
110
111Several test case classes are provided with the test suite. 
112cases/soft-inj/* is based on mce-inject MCE software injection tool.
113cases/apei-inj/* is based on apei-inj APEI haredware injection tool.
114
115cases/<injection tool>/<class name>/cases.sh	Interface of the test case class
116cases/<injection tool>/<class name>/data/	Directory contains data file
117cases/<injection tool>/<class name>/refer/	Directory contains data file for
118					reference MCE records if necessary.
119
120For document of various test cases, please refer to doc/cases/*.
121
1222.2 Test drivers
123
124Test drivers drive the test procedure, its main structure is a loop
125over test case classes specified in configuration file. For each test
126case class, test driver loops over test cases returned by "cases.sh
127enumerate". And, for each test case, it calls "cases.sh" to trigger,
128get_result and verify the test case. Test driver also do some common
129work for test cases, such as kdump driver collects vmcore file, and
130invoking gcovdump command to get gcov data file.
131
132The interface of test driver is driver.sh, which is usually put in
133drivers/<driver_name>/ directory. The test configuration file should
134be used as the only command line parameter for driver.sh. Test case
135classes should be specified in test configuration file as CASES
136variable, details below.
137
1382.3 Test configuration file
139
140Test configuration file is a shell script to specify parameters for
141test drivers and test cases. It must be put in config/ directory. The
142parameters are represented as shell variables as follow:
143
144CASES				Name of test case classes, separate by
145				white space.
146START_BACKGROUND		Shell command to start a background process
147				during testing, used for random testing.*
148STOP_BACKGROUND			Shell command to stop the background process
149				during testing.
150COREDIR (for kdump)		directory contains Linux kernel crash core
151				dump after kdump.
152VMLINUX (for kdump, gcov)	vmlinux of Linux kernel
153GCOV (for gcov)			Enable GCOV if set none zero.
154KSRC_DIR (for gcov)		Kernel source code directory
155
156* To test MCE processing under random environment, a background
157  process can be automatically run simultaneously during MCE
158  testing. The start/stop command is specified via START_BACKGROUND
159  and STOP_BACKGROUND.
160
1612.4 Test result
162
163After test, the general test result will go results/<driver_name>/result.
164The format of general test result is as follow:
165
166<test case name>:
167  Passed: item 1 description
168  Failed: item 2 description
169  ...
170  Passed: item n description
171
172One blank line is used to separate test cases.
173
174Additional test result for various test cases will go
175"results/<driver_name>/<case_name>/<xxx>. For in-package test case
176class, additional test results include:
177
178results/<driver_name>/<injection_tool>/<case_name>/klog
179				Kernel log during testing
180results/<driver_name>/<injection_tool>/<case_name>/mcelog
181				mcelog output during testing
182results/<driver_name>/<injection_tool>/<case_name>/mcelog_refer
183				mcelog output reference
184results/<driver_name>/<injection_tool>/<case_name>/mce_64.c.gcov (for gcov)
185				gcov output file
186
187
1883. Tools
189--------
190
1913.1 mce-inject
192
193mce-inject is a software MCE injection tool, which is based on Linux
194kernel software MCE injection mechanism. To inject a MCE into Linux
195kernel via mce-inject, a data file should be provided. The syntax is
196similar to the logging output by mcelog with some extensions. 
197Please refer to the documentation of mce-inject for more information.
198
199The mce-inject program must be executable in $PATH.
200
2013.2 mcelog
202
203mcelog read /dev/mcelog and prints the stored machine check records to
204stdout. It is used by MCE test suite to verify MCE records generated
205by kernel is same as reference records, at most time, same as input
206records. The current git mcelog version is needed for MCE test suite to
207work properly. Please refer to document of mcelog for more
208information. The latest mcelog can be gotten via git snapshot from
209git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git.
210
211Note you need the git version of mcelog available in $PATH.
212
2133.3 gcovdump
214
215gcov is a test coverage tool, the original implementation is used for
216user space program only. LTP (Linux Test Project) provides the kernel
217gcov support. But MCE test involves panic or kdump, so gcovdump is
218developed to dump gcov data from kdump crash dump core. gcovdump has
219been merged by LTP cvs. For more information please refer to gcovdump
220document. The latest gcovdump can be gotten from cvs:
221http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/.
222
2233.4 page-types
224A tool to query page types, which is accompanied with Linux kernel 
225source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c).
226It is required for MCE apei-inj testing. 
227
2284. Usage Guide
229--------------
230
2314.1 Test with simple test driver
232
2334.1.1 Simple test driver
234
235The simple test driver just call cases.sh of test cases one by one in
236a loop. So it is not permitted for test cases to trigger real panic or
237reboot during test. For MCE testing, a special processing mode to just
238log everything in case of MCE is used for the simple test driver, it
239is enabled via set MCE parameter "tolerant=3" during
240testing. "tolerant" can be set via writing:
241    /sys/devices/system/machinecheck/machinecheck0/tolerant
242
2434.1.2 test instruction
244
245The following is the basic test instruction, for some additional
246features such as gcov support, please refer to corresponding
247instructions.
248
249a. Linux kernel and user space tools as follow should be installed
250
251   - A Linux kernel with full MCE injection support (see 0)
252   - mce-inject tool (see 3.1)
253   - mcelog with proper version (see 3.2)
254   - page-types (see 3.4)
255
256b. Modify config/simple.conf or create a new test configuration
257   file. Refer to section 2.3 for more instruction about test
258   configuration file.
259
260c. Run "make". Carefully check for any errors.
261
262d. It is recommended to stop cron before testing. Because there
263   might be another mcelog reading events running on background
264   by cron, which will upset the test.
265
266      /etc/init.d/crond stop
267
268e. To be root and invoke simple test driver on test configuration file
269   as follow
270
271       Run "make test" to do all the standard tests that do not require
272       special set up.
273
274f. General test result will go results/simple/result. Test log will go
275   work/simple/log. Additional test results for various test cases
276   will go results/simple/<test case>/<xxx>. For more details about
277   in-package test case class, please refer to section 2.1.
278
279
2804.2 Test with kdump test driver
281
2824.2.1 kdump test driver
283
284The kdump test driver is based on the kdump test case in Linux Test
285Project, thank LTP for their excellent work!
286
287The kdump driver helps run tests which trigger crash/panic and
288generate result and report via kdump.  The test scripts cycle through a
289series of crash/panic scenarios. Each test cycle does the following:
290
291a.  Triggers a test case which triggers crash/panic (MCE with tolerant=1).
292b.  Kdump kernel boots and saves a vmcore.
293c.  System reboots to 1st kernel.
294d.  Verifies test case, generate result and report.
295e.  After a 1 to 2 minute delay, the next test case is run.
296
2974.2.2 test instruction
298
299Follow the steps to setup kdump test driver.
300
301The test driver is written for SuSE Linux Enterprise Server 10 (and
302onward releases), OpenSUSE, Fedora, Debian, as well as RedHat
303Enterprise Linux 5. Since KDUMP is supported by the above mentioned
304distro's the test driver was written and tested on them. Contribution
305towards supporting more distributions are welcome.
306
307a. Install Linux kernel with full MCE injection and KDUMP support. In
308   addition to MCE injection support in section 0, the following
309   configuration options should be enabled too:
310
311   CONFIG_KEXEC=y
312   CONFIG_CRASH_DUMP=y
313
314b. Install these additional packages:
315
316   For SLES10 or OpenSUSE Distro:
317
318     * kernel-kdump
319     * kernel-source
320     * kexec-tools
321
322   For RHEL5 or Fedora distro:
323
324     * kexec-tools
325     * kernel-devel
326
327c. Configure where to put the kdump /proc/vmcore files. The path should be
328   specified via COREDIR in test configuration file.
329   By default, the kdump /proc/vmcore files will be put into /var/crash.
330
331   For SLES10 or OpenSUSE Distro:
332     * edit KDUMP_SAVEDIR in /etc/sysconfig/kdump
333   For RHEL5 or Fedora distro:
334     * edit path in /etc/kdump.conf
335
336d. In addition to bzImage and modules of Linux kernel should be
337   installed on test machine, the vmlinux of Linux kernel should be
338   put on test machine and specified via VMLINUX in test configuration
339   file.
340
341e. Make sure the partition where the test driver is running has space
342   for the tests results and one vmcore file (size of physical
343   memory).
344
345f. Now, reboot system. Test if kdump works by starting kdump and triggering
346   kernel panic.
347
348   For SLES10 or OpenSUSE Distro:
349    service boot.kdump restart
350    chkconfig boot.kdump on
351    echo "c" > /proc/sysrq-trigger
352
353   For RHEL5 or Fedora distro:
354    service kdump restart
355    /sbin/chkconfig kdump on
356    echo "c" > /proc/sysrq-trigger
357
358   After system reboot, check if there are vmcore files. By default, they are in /var/crash/*/. If yes, "kdump" works in the system.
359
360g. Create a new test configuration file or use a existing one in
361   config/, such as kdump.conf. Note: not all test case classes can be
362   used with kdump test driver, see "important points" below.
363
364h. Run "make". Carefully check for any errors.
365
366i. To be root and run "drivers/kdump/driver.sh <conf>" or "make test-kdump" (for a full test)
367
368j. After test is done, the test log of the last run of kdump driver will
369   be displayed on main console.
370
371Few Important points to remember:
372
373- kdump test driver request that a real panic should be triggered when
374  test case is triggered. So not all test case classes can be used
375  with kdump test driver, for example, all test case classes for
376  corrected MCE can not be used with kdump test driver.
377
378- If you need to stop the tests before all test cases have run, run
379  "crontab -r" and "killall driver.sh" within 1 minute after the 1st
380  kernel reboots. Then, if you'd like to carry on tests from that point
381  on, run:
382    rm work/kdump/stamps/setupped
383    drivers/kdump/driver.sh <conf>
384  If you'd like to start tests from the beginning, run:
385    make reset
386    drivers/kdump/driver.sh <conf>
387
388- If a failure occurs when booting the kdump kernel, you'll need to
389  manually reset the system so it reboots back to the 1st kernel and
390  continues on to the next test. For this reason, it's best to monitor
391  the tests from a console. If possible, setup a serial console (not a
392  must, any type of console setup will do). If using minicom, enable
393  saving of kernel messages displayed on minicom into a file, by
394  pressing ctrl+a+l on the console. Else, when it is observed that the
395  kdump kernel has failed to boot, manually copy the boot message into
396  a file to enable the debugging the cause of the hang.
397
398- The results are saved in results/kdump/result, which also shows
399  where you are in the test run. When the "Test run complete" entry
400  appears in that file, you're done. Verbose log can be found at
401  work/log.
402
403- The test machine would be unavailable for any other work during the
404  period of the test run.
405
4064.3 Gcov support
407
408Gcov is a test coverage tool. It can be used to discover untested
409parts of program, collect branch taken statistics to optimize program,
410etc. In MCE test suite, it is used to get test coverage, that is,
411which C statements are covered by each test case.
412
413Gcov support is optional, if you don't care about test coverage
414information, just skip this section.
415
416a. Make sure your kernel has gcov support. You can find lasted kernel
417   gcov patches from:
418       http://ltp.sourceforge.net/coverage/gcov.php
419
420   A README for kernel gcov can be found from:
421       http://ltp.sourceforge.net/coverage/gcov/readme.php
422
423   Notes: CONFIG_GCOV_ALL does not work for me. Add the line
424       EXTRA_CFLAGS += $(KBUILD_GCOV_FLAGS)
425   to the respective Makefiles are more stable. For example, this line
426   can be added into "linux/arch/x86/kernel/cpu/mcheck/Makefile"
427
428b. If you want to use gcov with kdump test driver, please install
429   gcovdump tool(see section 3.4). The latest gcovdump can be gotten
430   from cvs:
431   http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/.
432
433c. Linux kernel source source code should be put on the test
434   machine. Its root directory should be specified in test
435   configuration file via KSRC_DIR.
436
437d. In addition to bzImage and modules of Linux kernel should be
438   installed on test machine, the vmlinux of Linux kernel should be
439   put on test machine and specified via VMLINUX of test configuration
440   file.
441
442e. Make sure gcov is available in your test system. It comes with gcc
443   package normally. If kdump test driver is used, a tool named
444   gcovdump is also needed to dump *.gcda from crash dump image.
445
446f. In test configuration file, make sure the following setting is
447   available:
448
449       # enable GCOV support
450       GCOV=1
451       # kernel source is needed to get gcov graph
452       KSRC_DIR=<kernel source directory>
453       VMLINUX=<vmlinux>
454
455g. After testing, *.c.gcov will be generated in test case result
456   directory, such as
457   results/kdump/soft-inj/non-panic/corrected/mce_64.c.gcov.
458
459h. To merge gcov graph data from several test cases, a tool named
460   gcov_merge.py in tools sub-directory can be used. For example,
461
462       tools/gcov_merge results/kdump/soft-inj/*/*/mce_64.c.gcov
463
464   Will output merged gcov graph from all test cases under
465   soft-inj. This can be used to check coverage of several test cases.
466
4674.4 tool
468
469Some tools are provided to help analyze test result.
470
471- tools/grep_result.sh
472
473  Grep from general test result (results/<driver_name>/result) in
474  terms of test case instead of line, because the result of one test
475  case may span several line.
476
477  Usage:
478      cat results/<driver_name>/result | tools/grep_result.sh <grep options>
479
480  Where <grep options> are same as options available to /bin/grep.
481
482- tools/loop-mce-test
483
484  Run mce test cases in a loop. It exits on failure of any one of the test
485  cases. This script is using simple test driver.
486
487  Usage:
488      ./loop-mce-test <config_file>
489 
490  Note that only simple test configure file can be used here.
491
4925. Add test cases
493-----------------
494
4955.1 Add test case to in-package test class
496
497All in-package test classes use mce-inject software injection tool and
498follows same structure. The steps to add a new test case is as follow:
499
500a. Find an appropriate test case class to add your test case.
501
502b. Add a new mce-inject data file into to cases/soft-inj/<class name>/data/.
503
504c. If the reference mcelog is different from mce-inject input data
505   file, put that reference file into cases/soft-inj/<class_name>/refer/.
506
507d. In cases/soft-inj/<class name>/cases.sh, there are shell commands
508   "case" in shell functions get_result() and verify(). Add a branch
509   in each shell command "case" for your test case.
510
5115.2 Add test class
512
513To add a new test class, add a cases.sh under a sub-directory of
514cases/, and follow the test case class interface definition in section
5152.1. The general result output format should follow that in section
5162.4.
517