Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bkmonitorbeat采集CPU使用率当CPU热更新扩容核心持续报错 #687

Closed
abbhb opened this issue Jan 15, 2025 · 1 comment
Closed

Comments

@abbhb
Copy link
Contributor

abbhb commented Jan 15, 2025

操作系统信息:
Centos7 GNU/Linux
版本信息:
bkmonitorbeat 3.22.1392
问题描述:
当对机器扩CPU核心数且不用重启会导致bkmonitorbeat的CPU单核使用率持续无法顺利上报,

Image

因为代码里遇到错误return err后并没有将最新的数据覆盖进lastCPUTimeSlice里,导致无法自动恢复,只能手动重启bkmonitorbeat

相关代码如下

func getCPUStatUsage(report *CpuReport) error {
	var err error
	perCPUTimes, err := cpu.Times(true)
	if err != nil {
		return err
	}
	// 比较两次获取的时间片的内容的长度,如果不对等直接退出
	lastCPUTimeSlice.Lock()
	defer lastCPUTimeSlice.Unlock()

	// 判断lastPerCPUTimes长度,增加重写避免init方法失效的情况
	if len(lastCPUTimeSlice.lastPerCPUTimes) <= 0 || len(perCPUTimes) != len(lastCPUTimeSlice.lastPerCPUTimes) {
		lastCPUTimeSlice.lastPerCPUTimes, err = cpu.Times(true)
		if err != nil {
			return err
		}
	}

	l1, l2 := len(perCPUTimes), len(lastCPUTimeSlice.lastPerCPUTimes)
	if l1 != l2 {
		err = fmt.Errorf("received two CPU counts %d != %d", l1, l2)
		return err
	}

	for index, value := range perCPUTimes {
		item := lastCPUTimeSlice.lastPerCPUTimes[index]
		tmp := calcTimeState(item, value)
		report.Stat = append(report.Stat, tmp)
	}

	cpuTimes, err := cpu.Times(false)
	if err != nil {
		return err
	}

	// 判断lastCPUTimes的长度,增加重写避免init方法失效的情况
	if len(lastCPUTimeSlice.lastCPUTimes) <= 0 {
		lastCPUTimeSlice.lastCPUTimes, err = cpu.Times(false)
		if err != nil {
			return err
		}
	}

	cpuTimeStat := cpuTimes[0]
	lastCpuTimeStat := lastCPUTimeSlice.lastCPUTimes[0]
	report.TotalStat = calcTimeState(lastCpuTimeStat, cpuTimeStat)

	// 将此次获取的timeState重新写入公共变量
	lastCPUTimeSlice.lastCPUTimes = cpuTimes
	lastCPUTimeSlice.lastPerCPUTimes = perCPUTimes

	// per usage
	report.Usage, err = cpu.Percent(0, true)
	if err != nil {
		return err
	}

	for i := range report.Usage {
		if report.Usage[i] < 0 || int(report.Usage[i]) > 100 {
			report.Usage[i] = 0.0
		}
	}
	// total usage
	total, err := cpu.Percent(0, false)
	if err != nil {
		return err
	}

	report.TotalUsage = total[0]
	if report.TotalUsage < 0 || report.TotalUsage > 100 {
		report.TotalUsage = 0.0
	}
	return nil
}
@abbhb
Copy link
Contributor Author

abbhb commented Jan 20, 2025

这个问题已经在 #595 #128 已经被修复过,直接更新版本即可

@abbhb abbhb closed this as completed Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant