1 2MCE test suite HOWTO 3==================== 4 511 November 2008 6 7Huang Ying 8 9Section 4.2 (Test with kdump test driver) is based on the README of 10LTP kdump test case. 11 12Abstract 13-------- 14 15This document explains the structure and design of MCE test suite, the 16kernel patch and user space tools needed for automatic tests, usage 17guide and how to add new test cases into test suite. 18 190. Quick shortcut 20------------------ 21 22- Install the Linux kernel with full MCE injection support, including 23 latest Linux kernel (2.6.31) and MCE injection enhancement patchset 24 in: http://ftp.kernel.org/pub/linux/kernel/people/yhuang/mce/. Make 25 sure following configuration options are enabled: 26 27 CONFIG_X86_MCE=y 28 CONFIG_X86_MCE_INTEL=y 29 CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m 30 31- Get mcelog git version from 32 git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git. 33 and install in /usr/sbin (or rather first in your $PATH) 34 35 git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git 36 cd mcelog 37 make 38 sudo make install 39 40- Get mce-inject git version from 41 git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git. 42 43 git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git 44 cd mce-inject 45 make 46 sudo make install 47 48- Install page-types tool (sec 3.4), which is accompanied with Linux kernel 49 source (2.6.32 or newer). 50 51 cd $KERNEL_SRC/Documentation/vm/ 52 gcc -o page-types page-types.c 53 cp page-types /usr/bin/ 54 55- Run make test 56 This will do the basic tests, but not the more complicated kdump ones. 57 For more information on those read below. 58 591. Introduction 60--------------- 61 62The MCE test suite is a collection of tools and test scripts for 63testing the Linux kernel MCE processing features. The goal is to cover 64most Linux kernel MCE processing code paths and features with 65automation tests. 66 67If you just want to start testing as quickly as possible, you can skip 68section 2 and section 3, and go section 4.1 directly. 69 70 712. Structure 72------------ 73 74The main intention behind the design is to re-use test cases amongst 75various test methods (represented as test drivers), such as kdump 76based, kernel MCE panic log (tolerant=3) based, etc. 77 782.1 Test cases 79 80Test cases are grouped into test case classes. The test cases in one 81class share the similar triggering, result collecting and result 82verifying methods. They can be used in same set of test drivers. The 83interface of a test case class is a shell script, usually named as 84cases.sh under a sub-directory of cases/. The following command line 85option should be supported by the test case class shell script: 86 87cases.sh enumerate enumerate test cases in class, print test 88 case names to stdout 89cases.sh trigger trigger the test case 90cases.sh get_result get the result of test case 91cases.sh verify verify the result of test case, and print 92 the verify result to stdout 93 94When execute cases.sh [trigger|get_result|verify], the test case is 95specified via environment variable this_case, which must be one of the 96test case names returned by "cases.sh enumerate". 97 98Other environment variables are also used to pass some information 99from driver to test cases, such as: 100 101this_case name current test case 102driver name of test driver 103klog file name which holds kernel log during test 104KSRC_DIR (for gcov) kernel source code directory 105GCOV (for gcov) gcov collection method 106vmcore (for kdump) vmcore file name 107reboot (for kdump) indicate there is a reboot between test 108 case trigger and test case verify, some 109 context has been gone. 110 111Several test case classes are provided with the test suite. 112cases/soft-inj/* is based on mce-inject MCE software injection tool. 113cases/apei-inj/* is based on apei-inj APEI haredware injection tool. 114 115cases/<injection tool>/<class name>/cases.sh Interface of the test case class 116cases/<injection tool>/<class name>/data/ Directory contains data file 117cases/<injection tool>/<class name>/refer/ Directory contains data file for 118 reference MCE records if necessary. 119 120For document of various test cases, please refer to doc/cases/*. 121 1222.2 Test drivers 123 124Test drivers drive the test procedure, its main structure is a loop 125over test case classes specified in configuration file. For each test 126case class, test driver loops over test cases returned by "cases.sh 127enumerate". And, for each test case, it calls "cases.sh" to trigger, 128get_result and verify the test case. Test driver also do some common 129work for test cases, such as kdump driver collects vmcore file, and 130invoking gcovdump command to get gcov data file. 131 132The interface of test driver is driver.sh, which is usually put in 133drivers/<driver_name>/ directory. The test configuration file should 134be used as the only command line parameter for driver.sh. Test case 135classes should be specified in test configuration file as CASES 136variable, details below. 137 1382.3 Test configuration file 139 140Test configuration file is a shell script to specify parameters for 141test drivers and test cases. It must be put in config/ directory. The 142parameters are represented as shell variables as follow: 143 144CASES Name of test case classes, separate by 145 white space. 146START_BACKGROUND Shell command to start a background process 147 during testing, used for random testing.* 148STOP_BACKGROUND Shell command to stop the background process 149 during testing. 150COREDIR (for kdump) directory contains Linux kernel crash core 151 dump after kdump. 152VMLINUX (for kdump, gcov) vmlinux of Linux kernel 153GCOV (for gcov) Enable GCOV if set none zero. 154KSRC_DIR (for gcov) Kernel source code directory 155 156* To test MCE processing under random environment, a background 157 process can be automatically run simultaneously during MCE 158 testing. The start/stop command is specified via START_BACKGROUND 159 and STOP_BACKGROUND. 160 1612.4 Test result 162 163After test, the general test result will go results/<driver_name>/result. 164The format of general test result is as follow: 165 166<test case name>: 167 Passed: item 1 description 168 Failed: item 2 description 169 ... 170 Passed: item n description 171 172One blank line is used to separate test cases. 173 174Additional test result for various test cases will go 175"results/<driver_name>/<case_name>/<xxx>. For in-package test case 176class, additional test results include: 177 178results/<driver_name>/<injection_tool>/<case_name>/klog 179 Kernel log during testing 180results/<driver_name>/<injection_tool>/<case_name>/mcelog 181 mcelog output during testing 182results/<driver_name>/<injection_tool>/<case_name>/mcelog_refer 183 mcelog output reference 184results/<driver_name>/<injection_tool>/<case_name>/mce_64.c.gcov (for gcov) 185 gcov output file 186 187 1883. Tools 189-------- 190 1913.1 mce-inject 192 193mce-inject is a software MCE injection tool, which is based on Linux 194kernel software MCE injection mechanism. To inject a MCE into Linux 195kernel via mce-inject, a data file should be provided. The syntax is 196similar to the logging output by mcelog with some extensions. 197Please refer to the documentation of mce-inject for more information. 198 199The mce-inject program must be executable in $PATH. 200 2013.2 mcelog 202 203mcelog read /dev/mcelog and prints the stored machine check records to 204stdout. It is used by MCE test suite to verify MCE records generated 205by kernel is same as reference records, at most time, same as input 206records. The current git mcelog version is needed for MCE test suite to 207work properly. Please refer to document of mcelog for more 208information. The latest mcelog can be gotten via git snapshot from 209git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git. 210 211Note you need the git version of mcelog available in $PATH. 212 2133.3 gcovdump 214 215gcov is a test coverage tool, the original implementation is used for 216user space program only. LTP (Linux Test Project) provides the kernel 217gcov support. But MCE test involves panic or kdump, so gcovdump is 218developed to dump gcov data from kdump crash dump core. gcovdump has 219been merged by LTP cvs. For more information please refer to gcovdump 220document. The latest gcovdump can be gotten from cvs: 221http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/. 222 2233.4 page-types 224A tool to query page types, which is accompanied with Linux kernel 225source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c). 226It is required for MCE apei-inj testing. 227 2284. Usage Guide 229-------------- 230 2314.1 Test with simple test driver 232 2334.1.1 Simple test driver 234 235The simple test driver just call cases.sh of test cases one by one in 236a loop. So it is not permitted for test cases to trigger real panic or 237reboot during test. For MCE testing, a special processing mode to just 238log everything in case of MCE is used for the simple test driver, it 239is enabled via set MCE parameter "tolerant=3" during 240testing. "tolerant" can be set via writing: 241 /sys/devices/system/machinecheck/machinecheck0/tolerant 242 2434.1.2 test instruction 244 245The following is the basic test instruction, for some additional 246features such as gcov support, please refer to corresponding 247instructions. 248 249a. Linux kernel and user space tools as follow should be installed 250 251 - A Linux kernel with full MCE injection support (see 0) 252 - mce-inject tool (see 3.1) 253 - mcelog with proper version (see 3.2) 254 - page-types (see 3.4) 255 256b. Modify config/simple.conf or create a new test configuration 257 file. Refer to section 2.3 for more instruction about test 258 configuration file. 259 260c. Run "make". Carefully check for any errors. 261 262d. It is recommended to stop cron before testing. Because there 263 might be another mcelog reading events running on background 264 by cron, which will upset the test. 265 266 /etc/init.d/crond stop 267 268e. To be root and invoke simple test driver on test configuration file 269 as follow 270 271 Run "make test" to do all the standard tests that do not require 272 special set up. 273 274f. General test result will go results/simple/result. Test log will go 275 work/simple/log. Additional test results for various test cases 276 will go results/simple/<test case>/<xxx>. For more details about 277 in-package test case class, please refer to section 2.1. 278 279 2804.2 Test with kdump test driver 281 2824.2.1 kdump test driver 283 284The kdump test driver is based on the kdump test case in Linux Test 285Project, thank LTP for their excellent work! 286 287The kdump driver helps run tests which trigger crash/panic and 288generate result and report via kdump. The test scripts cycle through a 289series of crash/panic scenarios. Each test cycle does the following: 290 291a. Triggers a test case which triggers crash/panic (MCE with tolerant=1). 292b. Kdump kernel boots and saves a vmcore. 293c. System reboots to 1st kernel. 294d. Verifies test case, generate result and report. 295e. After a 1 to 2 minute delay, the next test case is run. 296 2974.2.2 test instruction 298 299Follow the steps to setup kdump test driver. 300 301The test driver is written for SuSE Linux Enterprise Server 10 (and 302onward releases), OpenSUSE, Fedora, Debian, as well as RedHat 303Enterprise Linux 5. Since KDUMP is supported by the above mentioned 304distro's the test driver was written and tested on them. Contribution 305towards supporting more distributions are welcome. 306 307a. Install Linux kernel with full MCE injection and KDUMP support. In 308 addition to MCE injection support in section 0, the following 309 configuration options should be enabled too: 310 311 CONFIG_KEXEC=y 312 CONFIG_CRASH_DUMP=y 313 314b. Install these additional packages: 315 316 For SLES10 or OpenSUSE Distro: 317 318 * kernel-kdump 319 * kernel-source 320 * kexec-tools 321 322 For RHEL5 or Fedora distro: 323 324 * kexec-tools 325 * kernel-devel 326 327c. Configure where to put the kdump /proc/vmcore files. The path should be 328 specified via COREDIR in test configuration file. 329 By default, the kdump /proc/vmcore files will be put into /var/crash. 330 331 For SLES10 or OpenSUSE Distro: 332 * edit KDUMP_SAVEDIR in /etc/sysconfig/kdump 333 For RHEL5 or Fedora distro: 334 * edit path in /etc/kdump.conf 335 336d. In addition to bzImage and modules of Linux kernel should be 337 installed on test machine, the vmlinux of Linux kernel should be 338 put on test machine and specified via VMLINUX in test configuration 339 file. 340 341e. Make sure the partition where the test driver is running has space 342 for the tests results and one vmcore file (size of physical 343 memory). 344 345f. Now, reboot system. Test if kdump works by starting kdump and triggering 346 kernel panic. 347 348 For SLES10 or OpenSUSE Distro: 349 service boot.kdump restart 350 chkconfig boot.kdump on 351 echo "c" > /proc/sysrq-trigger 352 353 For RHEL5 or Fedora distro: 354 service kdump restart 355 /sbin/chkconfig kdump on 356 echo "c" > /proc/sysrq-trigger 357 358 After system reboot, check if there are vmcore files. By default, they are in /var/crash/*/. If yes, "kdump" works in the system. 359 360g. Create a new test configuration file or use a existing one in 361 config/, such as kdump.conf. Note: not all test case classes can be 362 used with kdump test driver, see "important points" below. 363 364h. Run "make". Carefully check for any errors. 365 366i. To be root and run "drivers/kdump/driver.sh <conf>" or "make test-kdump" (for a full test) 367 368j. After test is done, the test log of the last run of kdump driver will 369 be displayed on main console. 370 371Few Important points to remember: 372 373- kdump test driver request that a real panic should be triggered when 374 test case is triggered. So not all test case classes can be used 375 with kdump test driver, for example, all test case classes for 376 corrected MCE can not be used with kdump test driver. 377 378- If you need to stop the tests before all test cases have run, run 379 "crontab -r" and "killall driver.sh" within 1 minute after the 1st 380 kernel reboots. Then, if you'd like to carry on tests from that point 381 on, run: 382 rm work/kdump/stamps/setupped 383 drivers/kdump/driver.sh <conf> 384 If you'd like to start tests from the beginning, run: 385 make reset 386 drivers/kdump/driver.sh <conf> 387 388- If a failure occurs when booting the kdump kernel, you'll need to 389 manually reset the system so it reboots back to the 1st kernel and 390 continues on to the next test. For this reason, it's best to monitor 391 the tests from a console. If possible, setup a serial console (not a 392 must, any type of console setup will do). If using minicom, enable 393 saving of kernel messages displayed on minicom into a file, by 394 pressing ctrl+a+l on the console. Else, when it is observed that the 395 kdump kernel has failed to boot, manually copy the boot message into 396 a file to enable the debugging the cause of the hang. 397 398- The results are saved in results/kdump/result, which also shows 399 where you are in the test run. When the "Test run complete" entry 400 appears in that file, you're done. Verbose log can be found at 401 work/log. 402 403- The test machine would be unavailable for any other work during the 404 period of the test run. 405 4064.3 Gcov support 407 408Gcov is a test coverage tool. It can be used to discover untested 409parts of program, collect branch taken statistics to optimize program, 410etc. In MCE test suite, it is used to get test coverage, that is, 411which C statements are covered by each test case. 412 413Gcov support is optional, if you don't care about test coverage 414information, just skip this section. 415 416a. Make sure your kernel has gcov support. You can find lasted kernel 417 gcov patches from: 418 http://ltp.sourceforge.net/coverage/gcov.php 419 420 A README for kernel gcov can be found from: 421 http://ltp.sourceforge.net/coverage/gcov/readme.php 422 423 Notes: CONFIG_GCOV_ALL does not work for me. Add the line 424 EXTRA_CFLAGS += $(KBUILD_GCOV_FLAGS) 425 to the respective Makefiles are more stable. For example, this line 426 can be added into "linux/arch/x86/kernel/cpu/mcheck/Makefile" 427 428b. If you want to use gcov with kdump test driver, please install 429 gcovdump tool(see section 3.4). The latest gcovdump can be gotten 430 from cvs: 431 http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/. 432 433c. Linux kernel source source code should be put on the test 434 machine. Its root directory should be specified in test 435 configuration file via KSRC_DIR. 436 437d. In addition to bzImage and modules of Linux kernel should be 438 installed on test machine, the vmlinux of Linux kernel should be 439 put on test machine and specified via VMLINUX of test configuration 440 file. 441 442e. Make sure gcov is available in your test system. It comes with gcc 443 package normally. If kdump test driver is used, a tool named 444 gcovdump is also needed to dump *.gcda from crash dump image. 445 446f. In test configuration file, make sure the following setting is 447 available: 448 449 # enable GCOV support 450 GCOV=1 451 # kernel source is needed to get gcov graph 452 KSRC_DIR=<kernel source directory> 453 VMLINUX=<vmlinux> 454 455g. After testing, *.c.gcov will be generated in test case result 456 directory, such as 457 results/kdump/soft-inj/non-panic/corrected/mce_64.c.gcov. 458 459h. To merge gcov graph data from several test cases, a tool named 460 gcov_merge.py in tools sub-directory can be used. For example, 461 462 tools/gcov_merge results/kdump/soft-inj/*/*/mce_64.c.gcov 463 464 Will output merged gcov graph from all test cases under 465 soft-inj. This can be used to check coverage of several test cases. 466 4674.4 tool 468 469Some tools are provided to help analyze test result. 470 471- tools/grep_result.sh 472 473 Grep from general test result (results/<driver_name>/result) in 474 terms of test case instead of line, because the result of one test 475 case may span several line. 476 477 Usage: 478 cat results/<driver_name>/result | tools/grep_result.sh <grep options> 479 480 Where <grep options> are same as options available to /bin/grep. 481 482- tools/loop-mce-test 483 484 Run mce test cases in a loop. It exits on failure of any one of the test 485 cases. This script is using simple test driver. 486 487 Usage: 488 ./loop-mce-test <config_file> 489 490 Note that only simple test configure file can be used here. 491 4925. Add test cases 493----------------- 494 4955.1 Add test case to in-package test class 496 497All in-package test classes use mce-inject software injection tool and 498follows same structure. The steps to add a new test case is as follow: 499 500a. Find an appropriate test case class to add your test case. 501 502b. Add a new mce-inject data file into to cases/soft-inj/<class name>/data/. 503 504c. If the reference mcelog is different from mce-inject input data 505 file, put that reference file into cases/soft-inj/<class_name>/refer/. 506 507d. In cases/soft-inj/<class name>/cases.sh, there are shell commands 508 "case" in shell functions get_result() and verify(). Add a branch 509 in each shell command "case" for your test case. 510 5115.2 Add test class 512 513To add a new test class, add a cases.sh under a sub-directory of 514cases/, and follow the test case class interface definition in section 5152.1. The general result output format should follow that in section 5162.4. 517