Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODBC Error: Invalid character value for cast specification #327

Open
RickL18393 opened this issue Nov 19, 2024 · 22 comments
Open

ODBC Error: Invalid character value for cast specification #327

RickL18393 opened this issue Nov 19, 2024 · 22 comments

Comments

@RickL18393
Copy link

I am using ODBC driver version 8.15.1-windows-x86_64 connecting to an ElasticSearch server running version 8.15.1. When I attempt to query data from an ElasticSearch table I am getting this ODBC error: [Elastic][EsODBC 8.15.1(v8.15.0-1-g43209d3,u,r) Driver]Invalid character value for cast specification;

When I turn on ODBC error logging I get this message: Mon Nov 18 12:12:53 2024 - [ERROR] sql2c_string()@convert.c:3489 [STMT@0x0000017FEFACF0B0] can't convert Soham Sarkar to double.

Is there a way to determine which column the bad data is in from the ODBC error message?

Let me know if you need any more information.

Thanks,
Rick L.

@bpintea
Copy link
Collaborator

bpintea commented Nov 19, 2024

You should get a message like column #<number> is bound, copying data. before the failure message, which should help you id the column.
Please feel free to use https://discuss.elastic.co/ for getting help, or open issues here if you encounter defects. Thanks!

@bpintea bpintea closed this as not planned Won't fix, can't repro, duplicate, stale Nov 19, 2024
@RickL18393
Copy link
Author

RickL18393 commented Nov 19, 2024 via email

@bpintea
Copy link
Collaborator

bpintea commented Nov 19, 2024

Which logging level should I use?

Debug

@elastic elastic deleted a comment from RickL18393 Dec 2, 2024
@RickL18393
Copy link
Author

In the attached log I found the error message, Wed Nov 20 07:20:57 2024 - [ERROR] sql2c_string()@convert.c:3489 [STMT@0x00000230C2E6C6F0] can't convert Soham Sarkar to double, at row 7594. Looking at the previous rows, it appears that the #3 column contains the invalid data. However, I have checked the third and fourth (In case this is a zero based index) columns and neither of them are defined as a numeric data type so I do not understand why the ElasticSearch ODBC driver is attempting to convert a character string into a double.

A little more background, I am not pulling all the columns available in the table. I am assuming that the column # is assigned to the third column I selected and not the actual third column in the table. Is this correct? I have checked the actual third column in the table, which is a REAL/float data type, but it does not contain any character string data that I can see.

Am I reading this log correctly?

Thanks,
Rick L.
esodbc_20241120072056_8c37f8d3110e45509651c681c5b9d344.ece.optum.com_9243_20420-1.log

@bpintea
Copy link
Collaborator

bpintea commented Dec 12, 2024

The logs contain the following relevant entries:

Wed Nov 20 07:20:57 2024 - [DEBUG] SQLGetData()@odbc.c:947 ENTER: *StatementHandle=@0x00000230C2E6C6F0, ColumnNumber=3, TargetType=7, *TargetValuePtr=@0x00000230C3A60534, BufferLength=51, *StrLen_or_IndPtr=@0x000000BA493FDF48.
...
Wed Nov 20 07:20:57 2024 - [DEBUG] get_rec_c_type()@convert.c:581 [DESC@0x000000BA493F5470] AxD data C type: 7.
Wed Nov 20 07:20:57 2024 - [DEBUG] sql2c_string()@convert.c:3486 [STMT@0x00000230C2E6C6F0] string `Soham Sarkar` converted to dbl=0.000000e+00.
Wed Nov 20 07:20:57 2024 - [ERROR] sql2c_string()@convert.c:3489 [STMT@0x00000230C2E6C6F0] can't convert `Soham Sarkar` to double.

The first line logs the parameters received from the application, with no manipulation (i.e. before copying anything in internal structures). The key element there is TargetType=7, which means the application wants to have the result provided as an SQL real type (SQL_C_FLOAT, which is same as SQL_REAL).

The ColumnNumber=3 means the 3rd column in the result set (not the source table), 1-based. (The function's documented here.)

@RickL18393
Copy link
Author

RickL18393 commented Dec 12, 2024 via email

@bpintea
Copy link
Collaborator

bpintea commented Dec 13, 2024

@RickL18393, your email didn't make it through as you might have sent it. Have a look over the web, please.

@RickL18393
Copy link
Author

I am using Microsoft SQL Server SSIS (also known as Microsoft SQL Server Data Tools) 2019 to pull the data from ElasticSearch. Looking at the ODBC Source task, the third result set column (called Output Column in SSIS) is set to data type text stream [DT_TEXT]:

Image

However, when I look at the External Columns, the third column is amountExpected, which is data type float [DT_R4]:

Image

I am not trying to pull amountExpected. Is it possible that the ODBC driver is actually looking at the External Columns, which are all the columns available in the index, instead of just the columns I selected which are represented by the Output Columns?

I also checked the amountExpected column using DBVis and the string that is causing the error is not in any of the rows for that column.

Let me know if you need any more information.

Thanks,
Rick L.

@bpintea
Copy link
Collaborator

bpintea commented Dec 17, 2024

the third result set column (called Output Column in SSIS) is set to data type text stream [DT_TEXT]:

Is that autodetected, or does the user set that?

Is it possible that the ODBC driver is actually looking at the External Columns, which are all the columns available in the index

No, the driver doesn't have access to any of that, it's just a proxy between the ODBC API / application and Elasticsearch'es SQL REST API and doesn't hold state more than connection and results-related.

The query is contained in the log (truncated excerpt):

Wed Nov 20 07:20:56 2024 - [WARN] EsSQLExecute()@queries.c:3723 [STMT@0x00000230C2E6C6F0] query `select "addnlDetails" ,"amountExpected" ,"analyst" , 

and so are the columns of the response (0-based indexing):

Wed Nov 20 07:20:57 2024 - [INFO] attach_columns_cbor()@queries.c:629 [STMT@0x00000230C2E6C6F0] columns received: 331.
Wed Nov 20 07:20:57 2024 - [DEBUG] update_rec_count()@handles.c:1470 [DESC@0x00000230C2E6CFF8] recs array is growing 0 -> 331.
Wed Nov 20 07:20:57 2024 - [DEBUG] attach_one_column()@queries.c:329 [STMT@0x00000230C2E6C6F0] column #0: name=`addnlDetails`, type=-9 (`text`).
Wed Nov 20 07:20:57 2024 - [DEBUG] attach_one_column()@queries.c:329 [STMT@0x00000230C2E6C6F0] column #1: name=`amountExpected`, type=7 (`float`).
Wed Nov 20 07:20:57 2024 - [DEBUG] attach_one_column()@queries.c:329 [STMT@0x00000230C2E6C6F0] column #2: name=`analyst`, type=-9 (`text`).

But as written above, SSIS requests for some reason the output to be converted to a floating point number (1-based):

Wed Nov 20 07:20:57 2024 - [DEBUG] SQLGetData()@odbc.c:947 ENTER: *StatementHandle=@0x00000230C2E6C6F0, ColumnNumber=3, TargetType=7, *TargetValuePtr=@0x00000230C3A60534, BufferLength=51, *StrLen_or_IndPtr=@0x000000BA493FDF48.

I also checked the amountExpected column using DBVis and the string that is causing the error is not in any of the rows for that column.

Indeed, the string that can't be converted to a number is sourced from the analyst field/column.

Does it work loading this data in other apps (like Excel or some other SQL editor, like DbVisualizer)?
What happens in SSIS if you limit the loaded fields/columns to only string ones, like just the first and third?
You could also try to set the Varchar limit to a non-0 value and observe if SSIS handles the field differently.

@RickL18393
Copy link
Author

Is that autodetected, or does the user set that?

That column is autodetected as Unicode text stream [DT_NTEXT]. We do not need Unicode, so I changed it to text stream [DT_TEXT]

But as written above, SSIS requests for some reason the output to be converted to a floating point number (1-based):

I would suggest that this is the bug we are chasing, whether it be in the ODBC driver or SSIS.

Indeed, the string that can't be converted to a number is sourced from the analyst field/column.

Thank you for the clarification. That column does contain the string in question.

Does it work loading this data in other apps (like Excel or some other SQL editor, like DbVisualizer)?

I have not tried to pull the data with any other applications. My DbVis connection is using the JDBC driver, so I don't think that
would be a good comparison.

What happens in SSIS if you limit the loaded fields/columns to only string ones, like just the first and third?

I created a new Data Flow task and only selected the addnl and anaylst columns. I am now getting a different set of error
messages:

[ODBC Source [2]] Error: SQLSTATE: HY000, Message: [Elastic][EsODBC 8.15.1(v8.15.0-1-g43209d3,u,r) Driver]Invalid server answer;

[ODBC Source [2]] Error: SQLSTATE: HY000, Message: [Elastic][EsODBC 8.15.1(v8.15.0-1-g43209d3,u,r) Driver]Invalid server answer;

[ODBC Source [2]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "ODBC Source.Outputs[ODBC
Source Output]" failed because error code 0xC020F450 occurred, and the error row disposition on "ODBC Source" specifies failure
on error. An error occurred on the specified object of the specified component. There may be error messages posted before this
with more information about the failure.

[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on ODBC Source returned error
code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the
failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages
posted before this with more information about the failure.

You could also try to set the Varchar limit to a non-0 value and observe if SSIS handles the field differently.

Varchar Limit is already set to 8000.

@bpintea
Copy link
Collaborator

bpintea commented Dec 17, 2024

I changed it to text stream [DT_TEXT]

Is DT_WSTR/DT_STR making any difference? (Not very familiar with SSIS though.)

this is the bug we are chasing

Could be a bug, could also be a misconfiguration.

I am now getting a different set of error messages

Could you share the log files of the driver in this case too?

@RickL18393
Copy link
Author

I have deleted and recreated my SSIS package. I am no longer trying to change data types in the ODBC Source task, so all columns from the Elasticsearch index are being pulled as the data type that was auto detected. I am still getting this error:

[ODBC Source [14]] Error: SQLSTATE: 22018, Message: [Elastic][EsODBC 8.15.1(v8.15.0-1-g43209d3,u,r) Driver]Invalid character value for cast specification;

Running the SSIS package created 6 log files, the majority of which are over 353MB. I put them in a .zip file which got them down to 40MB, but this website will not upload the file. I assume it is too large. I have attached the smallest file. Hopefully that will give you the information you need. If it doesn't, than let me know where I can upload the .zip file.

Thanks,
Rick L.

esodbc_20241218105733_8c37f8d3110e45509651c681c5b9d344.ece.optum.com_9243_8732-1.log

@RickL18393
Copy link
Author

FYI, I will be on vacation starting 12/20/2024 returning to the office on 1/6/2025.

Rick L.

@bpintea
Copy link
Collaborator

bpintea commented Dec 19, 2024

Hopefully that will give you the information you need.

Unfortunately the log contains the same failure as before, so there's nothing new.

Regarding the above message, I'd be interesting to know the answer for the 1st one and if there are logs for the 3rd point there.

The reasons being:

  • for some reason SSIS tries to get that string column as a number. I'd be good to know if one can force a different type. Maybe besides the string ones mentioned there, try something else, even if it doesn't make sense, like dates or Boolean. Having the logs in these cases too would help.
  • "Invalid server answer;" seems a serious enough failure that at first sight should not happen. It'd be good to understand what's going on there.

If the logs will be too large achieved in this case too, I will provide you with a resource where to upload them.

@RickL18393
Copy link
Author

The autodetected data type for the analyst column is DT_NTEXT in my new SSIS package, so I don't think that makes a difference:

Image

I deleted the original SSIS package and logs that showed the Invalid server answer error, however I was able to recreate the SSIS package and got the same error. The logs are below:

ESInvalidServerAnswerLogs.zip

I am going to be busy on another project the rest of today and as I stated in my earlier comment I am on vacation starting tomorrow, 12/20/2024. I will be back on 01/06/2024.

Thanks,
Rick L.

@bpintea
Copy link
Collaborator

bpintea commented Dec 20, 2024

I was able to recreate the SSIS package and got the same error.

Thanks. There's something quite off there. I can see in the logs the query with two columns:

Thu Dec 19 11:49:36 2024 - [DEBUG] SQLPrepareW()@odbc.c:571 EXIT: ret=0, *hstmt=@0x00000221D0F48AD0, *szSqlStr=[56] `select "addnlDetails" ,"analyst" from "settlement_stage"`, cchSqlStr=-3.

but then the app asks for a 4th column:

Thu Dec 19 11:49:36 2024 - [DEBUG] SQLGetData()@odbc.c:947 ENTER: *StatementHandle=@0x00000221D0F48AD0, ColumnNumber=4, TargetType=-8, *TargetValuePtr=@0x000000CA90FFE2A4, BufferLength=0, *StrLen_or_IndPtr=@0x000000CA90FFE2E0.

The driver assumes a server error (as it's not parsing the query to understand how many columns to expect), but in this case it's an application error.

Might need to look further to the logs and SSIS deployment options, but that might not happen timely and it seems very specific to this deployment in particular.

@RickL18393
Copy link
Author

RickL18393 commented Jan 6, 2025 via email

@RickL1234
Copy link

I got an email that is the same as the comment you left right before I went on vacation. What do we need to do to move forward on getting this driver to work with SSIS? ElasticSearch online documentation states that the ElasticSearch ODBC driver is compatible with SSIS. I need a resolution to this issue ASAP, it is holding up a critical project.

Thanks,
Rick L.

@bpintea
Copy link
Collaborator

bpintea commented Jan 16, 2025

As written above, in your installation there's something seriously misaligned. From the logs I see you're using the 64bit version of the driver -- is SSIS on same architecture?

Also, during the exercise of configuring two columns only (as before), it might be useful to enable SSIS logging and check (or share) these logs for further clues as to why the interoperability fails.

@RickL1234
Copy link

RickL1234 commented Jan 17, 2025

We are using the commercial version of ElasticSearch. Is there some place where I can get more timely replies?

I can also tell you that I have quite a few SSIS packages that use ODBC drivers from other vendors that work just fine. That leads me to believe that the issue is with the ElasticSearch ODBC driver and not SSIS. If I am not mistaken, ODBC is a standard defined by Microsoft, so all ODBC drivers should look and act the same with SSIS.

What is the latest version of SSIS that the driver has been tested against?

{Added] I've been doing research on another issue related to ElasticSearch and I see that the ODBC driver may need to be licensed. Could this be causing these problems?

Thanks,
Rick L.

@bpintea
Copy link
Collaborator

bpintea commented Jan 20, 2025

Is there some place where I can get more timely replies?

This is the development repository of the driver. If you have a commercial contact with Elastic, they might be able to assist you on the support steps.

That leads me to believe that the issue is with the ElasticSearch ODBC driver and not SSIS.

I have found no evidence in the shared logs of any fault on the Elasticsearch ODBC driver part so far.

the ODBC driver may need to be licensed. Could this be causing these problems?

No, if the Elasticsearch cluster is running on the required license, the ODBC driver will connect, which wouldn't happen with an unfitting/expired license.

However:

I have just tried the latest driver release[*] on a SQL Server 2022 with Integration Services and successfully imported data from an Elasticsearch index over Microsoft ODE DB Providers for SQL Server into a local SQL Server database. The types are indeed not correctly autodetected so the Column Mappings need editing. Notably, the keyword type worked when mapped over the text SQL Server type, although others might work too. Other types might require tweaking as well, but float Elasticsearch type caused no issues.
(Interestingly, the Preview... functionality renders the correct data even if the import would later fail due to mismatched mapping).

I would normally recommend starting small, maybe writing an Elasticsearch SQL query to only select a few columns (instead of importing a whole index), figure the required mappings and add progressively more columns/fields, although I do admit that the logs you've shared above do show something quite off, as written above. This might still be a misconfiguration, however.

[*] The driver hasn't been updated since 8.15.1 that you're using.

@RickL1234
Copy link

It appears that the issue is resolved. I took your advice about starting with one column and building up. The first thing I did, however, was to created an entirely new Visual Studio solution for this project. Last time I only created a new project within the solution I was already using.

Anyway, the new solution appears to have resolved the issue and I am able to pull back all data from the index I've been working with. Thank you for your patience working through this with me!

Rick L.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants