Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of range #13

Open
dfurmanov opened this issue Oct 29, 2020 · 7 comments
Open

Index out of range #13

dfurmanov opened this issue Oct 29, 2020 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@dfurmanov
Copy link

Version 0.9.7

Stack trace

panic: runtime error: index out of range [18963] with length 17388
goroutine 75 [running]:
github.com/shakinm/xlsReader/xls/record.(*LabelSSt).GetString(0xc0004033a0, 0xc15b6c, 0x10)
	/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/labelSst.go:43 +0x70
github.com/shakinm/xlsReader/xls/record.(*Format).GetFormatString(0xc0000aaa20, 0x1043a20, 0xc0004033a0, 0x16d6740, 0x1)
	/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/format.go:125 +0xa85
github.com/dfurmanov/myapp/officetotxt.XLStoCSV(0x102dee0, 0xc0003025a0, 0xc0003c6630, 0x29, 0x0, 0xc0003c664f, 0x6)
	/gocode/src/github.com/dfurmanov/myapp/officetotxt/xls2csv.go:30 +0x30a

My code (sheetIndex = 0)

func XLStoCSV(w io.Writer, excelFileName string, sheetIndex int) error {
	workbook, err := xls.OpenFile(excelFileName)
	if err != nil {
		return err
	}

	sheet, err := workbook.GetSheet(sheetIndex)
	if err != nil {
		return err
	}

	cw := csv.NewWriter(w)

	for i := 0; i <= sheet.GetNumberRows(); i++ {
		if row, err := sheet.GetRow(i); err == nil {
			cols := row.GetCols()
			values := make([]string, len(cols))
			for i, cell := range cols {
				xfIndex := cell.GetXFIndex()
				formatIndex := workbook.GetXFbyIndex(xfIndex)
				format := workbook.GetFormatByIndex(formatIndex.GetFormatIndex())
				values[i] = format.GetFormatString(cell)
			}
			trimmedValues := TrimLatterEmptyValues(values)
			if len(trimmedValues) > 0 {
				err = cw.Write(trimmedValues)
				if err != nil {
					return err
				}
			}
		}
	}

	cw.Flush()
	return cw.Error()
}

Input file
error.xls.zip

@dougwinsby
Copy link

Haven't tested it, but I think maybe your for loop should be < instead of <= (because zero based).

@dfurmanov
Copy link
Author

@dougwinsby interesting, i'll check that. However, this is an example given by the author in the README

@dougwinsby
Copy link

I would try upgrading to 0.9.8. (See #8)

@dfurmanov
Copy link
Author

dfurmanov commented Oct 29, 2020

@dougwinsby I have tried both -- using < instead of <= and upgrading to v0.9.8. The same error persists

github.com/shakinm/xlsReader/xls/record.(*LabelSSt).GetString(0xc00046d0e0, 0xc5004c, 0x10)
	/opt/pr/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/labelSst.go:43 +0x70
github.com/shakinm/xlsReader/xls/record.(*Format).GetFormatString(0xc0000f08b8, 0x1099ba0, 0xc00046d0e0, 0x1733f00, 0x1)
	/opt/pr/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/format.go:125 +0xa85
github.com/dfurmanov/myapp/officetotxt.XLStoCSV(0x1082740, 0xc00077c330, 0xc00038d560, 0x29, 0x0, 0xc00038d57f, 0x6)

@shakinm shakinm self-assigned this Nov 2, 2020
@shakinm shakinm added the bug Something isn't working label Nov 3, 2020
@shakinm
Copy link
Owner

shakinm commented Nov 3, 2020

@dfurmanov thanks for finding this bug!
I tested the application with your Xls file and found that the SST records are not reading correctly, but if you re-save this file and restart the test, everything is fine.
It will take me some time to fix this error. Because I don't have much free time right now, but I will try to do it as quickly as possible.

@dfurmanov
Copy link
Author

@shakinm unfortunately I have no control over the files I am processing with this library so hopefully the fix will be out soon. Thank you very much for working on this!

kleeon added a commit to kleeon/xlsReader that referenced this issue Oct 13, 2022
@kleeon
Copy link
Contributor

kleeon commented Oct 13, 2022

I've been looking into this issue for the past day and I found a couple of things.
Firstly, there is a bug on this line:

if cch >= (len(s.RgbSrc)-3)/(1+int(grbit&1)) || s.ByteLen > 0 {

The header size here is hardocoded to be 3, when the size can actually vary depending on the flags set. When I fix it, @dfurmanov's file opens correctly.

However, this did not fix all of my problems because for some documents it fails to read strings broken up by a CONTINUE record. Microsoft docs claim that CONTINUE record has to contain flags field as its first byte, however I found some documents where flags byte is missing if the CONTINUE record begins in the formatting run. I checked openoffice xls documentation and indeed it does mention this quirk(section 5.21):

Formatting runs (➜2.5.1) cannot be split between their components (character index and FONT record index). If a string is split between two formatting runs, the option flags field will not be repeated in the CONTINUE record.

Seems to be poor documentation on Microsoft's part. When I fixed this issue, I no longer had any problems with reading SST records.

Going to do a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants