Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io-class setting for passthrough big IO; test data validation with vdbench detected error. #1444

Open
phyorat opened this issue Mar 8, 2023 · 2 comments
Assignees
Labels
bug Something isn't working v24.9

Comments

@phyorat
Copy link

phyorat commented Mar 8, 2023

Description

IO-class can passthrough big IO, like bigger than 128KB, to skip cache and writing to HDD directly; this can get higher performance and cache efficience. IO-class configure file example:

IO class id,IO class name,Eviction priority,Allocation
0,unclassified,22,0
1,request_size:le:131072,1,1

After load this IO-class, 128K IO data will write into HDD directly; and also if no cached data in cache, no data will read from cache but from HDD.
But on the other hand, if part of requested data, for example, head-64K data of 128K is cached; then read should taking 64K from cache and the left 64K from HDD.
64K cached <> 64K from HDD
|-------------------------------------|--------------------------------------|

This is expected data "splicing". The Acctual IO pathern is:
1, 64K write IO writing into cache, the first time with key = 1;
2, Read 64K and verified OK;
3, 64K write IO (with key = 2) merged two as one, writing 128KB into HDD directly;
4, Read 64K but verified failed (got key = 1, which indicating old data)
(We guess miss-read old 64K from cache)

step 3-4 may also be:
3, 64K write IO (with key = 2); plus two IO-write;
4, Two-read 64K IO merge as one, reading 128K from HDD directly(because of IO-class rule [1]);
5, Verified failed (got key = 1, which indicating old data)
(We guess miss-read old 128K data directly from HDD)

We verified data validation on this scenario, with vdbench; and found data validation error occured.

21:05:24.364 hd2-0: dvpost: /dev/vdb sd4 sd4 0x00000000 0x234520000 131072 0x0 0x5ecf4d1ed319c 0x11 0x2 0x70 0x0 0 36028797018963971
21:05:24.364 hd2-0:
21:05:24.364 hd2-0: Data Validation error for sd=sd4,lun=/dev/vdb
21:05:24.364 hd2-0: Block lba: 0x234520000; sector lba: 0x234520000; xfersize: 131072; relative sector in block: 0x00 ( 0)
21:05:24.364 hd2-0: ===> Data Validation Key miscompare.
21:05:24.364 hd2-0: ===> Data miscompare.
21:05:24.364 hd2-0: The sector below was written Tuesday, November 8, 2022 20:38:41.711 CST
21:05:24.364 hd2-0: 0x000 00000002 34520000 ........ ........ 00000002 34520000 0005ecf4 d1ed319c
21:05:24.364 hd2-0: 0x010 02..0000 73643420 20202020 00000000 01700000 20346473 20202020 00000000
21:05:24.364 hd2-0: Key miscompare always implies Data miscompare. Remainder of data suppressed.

This error shows that, tool wrote data-key "02xxxx" but read data from core is "01xxxx".
The key point is, rgiht after error occured, we read data from core-dev direactly, the data is correct - "02xxxx". So there should be data align/validation issue between cache and HDD, in a very tiny time interval (serval miliseconds)?

After cancel/clear this IO-class, and test again, no data validation error occured any more.

In addition, this configure <Sequential cutoff policy: always; --threshold 128KB> can also trigger data validation error.

Expected Behavior

No data align/validation issue between cache and HDD, when setting IO-class for skiping big IO written into cache.

Actual Behavior

There should be data align/validation issue between cache and HDD, if IO-read data partially cached.

Steps to Reproduce

  1. set IO-class to passthrough block IO bigger and equal than 128KB
  2. use vdbench to test data write and read validation; data is build on distributed block-storage system, lower storage is opencas nvme-cached HDD.
  3. vdbench report data validation error

Context

Base block storage for distributed block-system; need to guarantee that data validatoin is OK.

Possible Fix

Maybe meta-data are not strongly aligned or expired between different IO stage (in miliseconds).

Logs

No evidence until now; but reverse verification(remove that IO-class) can be a clue.

Your Environment

  • OpenCAS version (commit hash or tag):
    22.03.0.0666.release
  • Operating System:
    CentOS Linux release 7.6.1810 (Core)
  • Kernel version:
    5.10.38-21.hl02.el7.x86_64
  • Cache device type (NAND/Optane/other):
    NAND
  • Core device type (HDD/SSD/other):
    HDD
  • Cache configuration:
    • Cache mode: wb
    • Cache line size: 8
    • Promotion policy: always
    • Cleaning policy: alru
    • Sequential cutoff policy: never
  • Other (e.g. lsblk, casadm -P, casadm -L)
@phyorat phyorat added the bug Something isn't working label Mar 8, 2023
@katlapinka katlapinka self-assigned this Jul 16, 2024
@mmichal10
Copy link
Contributor

Hi @phyorat,

thank you for posting the issue. Do you happen to still have the vbdench config you used for your test?

@mmichal10
Copy link
Contributor

I came up with a fio config to mimic the vbdench's behaviour:

[dc_repro]
filename=/dev/cas1-1
ioengine=libaio
iodepth=1
direct=1
numjobs=1

# Generate new offset for every second write
rw=randwrite:2
rw_sequencer=identical

bssplit=64k/50:256k/50

# This ensures that every 64K write will be followed by 256K write
number_ios=2
loops=10000

verify=md5
# Verify after every write
verify_backlog=1
# Stop FIO if DC
verify_fatal=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v24.9
Projects
None yet
Development

No branches or pull requests

3 participants