Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [SWE-bench] Faild to cd. ModuleNotFoundError #231

Open
kevin-support-bot bot opened this issue Jan 24, 2025 · 19 comments
Open

[Bug]: [SWE-bench] Faild to cd. ModuleNotFoundError #231

kevin-support-bot bot opened this issue Jan 24, 2025 · 19 comments

Comments

@kevin-support-bot
Copy link

All-Hands-AI#6431 Issue


@BIJOY-SUST, Is this issue specific to OpenHands version 0.18.0, as it's not the latest?

@BIJOY-SUST
Copy link

specific to OpenHands version 0.18.0

@SmartManoj
Copy link
Owner

Could you let me know why you're using that version?

@BIJOY-SUST
Copy link

BIJOY-SUST commented Jan 25, 2025

I’ve been using the 0.18.0 version since its release. However, I recently switched to version 0.21.0 and encountered an issue.

Instance scikit-learn__scikit-learn-25500 - 2025-01-25 01:59:42,199 - ERROR - ----------                                        
Error in instance [scikit-learn__scikit-learn-25500]: Failed to cd to /workspace/scikit-learn__scikit-learn__1.3: **CmdOutputObs
ervation (source=None, exit code=-1, metadata={                                                                                 
  "exit_code": -1,                                                                                                              
  "pid": -1,                                                                                                                    
  "username": null,                                                                                                             
  "hostname": null,                                                                                                             
  "working_dir": null,                                                                                                          
  "py_interpreter_path": null,                                                                                                  
  "prefix": "[Below is the output of the previous command.]\n",                                                                 
  "suffix": "\n[Your command \"cd /workspace/scikit-learn__scikit-learn__1.3\" is NOT executed. The previous command is still ru
nning - You CANNOT send new commands until the previous command is completed. By setting `is_input` to `true`, you can interact 
with the current process: You may wait longer to see additional output of the previous command by sending empty command '', send
 other commands to interact with the current process, or send keys (\"C-c\", \"C-z\", \"C-d\") to interrupt/kill the previous co
mmand before sending your new command.]"   
-------------------------------
-------------------------------
    raise EvalException(msg)                                                                                                    
evaluation.utils.shared.EvalException: Failed to cd to /workspace/scikit-learn__scikit-learn__1.3: **CmdOutputObservation (sourc
e=None, exit code=-1, metadata={   

@SmartManoj
Copy link
Owner

SmartManoj commented Jan 25, 2025

Would you run ls /testbed and ls /workspace in that container? Here , the folder is copied.

@BIJOY-SUST
Copy link

BIJOY-SUST commented Jan 25, 2025

I couldn't find the specific container id from the issue, and after running docker ps -a, there are many containers I found. So it is not feasible to check each of them. However, I found that this error occurred if I am running more than one worker. For a single worker, it works fine. I think this issue occurred if there are parallel workers.

Another note: For parallel workers there is another kind of issue i noticed-

================ DOCKER BUILD STARTED ================                                                                                               
Instance sympy__sympy-24152 - 2025-01-25 04:46:05,989 - ERROR - [runtime ee4b4d13-2e77-413b-92ba-2e2e51510a2d-7e0041aa3ba95507] Error: Instance openh
ands-runtime-ee4b4d13-2e77-413b-92ba-2e2e51510a2d-7e0041aa3ba95507 FAILED to start container!                                                        
                                                                                                                                                     
Instance sympy__sympy-24152 - 2025-01-25 04:46:05,990 - ERROR - [runtime ee4b4d13-2e77-413b-92ba-2e2e51510a2d-7e0041aa3ba95507] 500 Server Error for 
http+docker://localhost/v1.47/containers/5e287512107eb31936ba37073688916026b3a861c248dac4dad0ebe42f7dd63b/start: Internal Server Error ("driver faile
d programming external connectivity on endpoint openhands-runtime-ee4b4d13-2e77-413b-92ba-2e2e51510a2d-7e0041aa3ba95507 (c278fa626eea3ce95fa977b0e674
602297f8d5c4221b9aba88febf265d475c4c): failed to bind port 0.0.0.0:58226/tcp: Error starting userland proxy: listen tcp4 0.0.0.0:58226: bind: address
 already in use")                                                                                                                                    
Instance sympy__sympy-24152 - 2025-01-25 04:46:06,010 - ERROR - ----------                                                                           
Error in instance [sympy__sympy-24152]: 500 Server Error for http+docker://localhost/v1.47/containers/5e287512107eb31936ba37073688916026b3a861c248dac
4dad0ebe42f7dd63b/start: Internal Server Error ("driver failed programming external connectivity on endpoint openhands-runtime-ee4b4d13-2e77-413b-92b
a-2e2e51510a2d-7e0041aa3ba95507 (c278fa626eea3ce95fa977b0e674602297f8d5c4221b9aba88febf265d475c4c): failed to bind port 0.0.0.0:58226/tcp: Error star
ting userland proxy: listen tcp4 0.0.0.0:58226: bind: address already in use"). Stacktrace:                                                          
Traceback (most recent call last):                                                                                                                   
  File "/home/user/.cache/pypoetry/virtualenvs/openhands-ai-lecMOyrf-py3.12/lib/python3.12/site-packages/docker/api/client.py", line 275, in _raise
_for_status                                                                                                                                          
    response.raise_for_status()                                                                                                                      
  File "/home/user/.cache/pypoetry/virtualenvs/openhands-ai-lecMOyrf-py3.12/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_f
or_status                                                                                                                                            
    raise HTTPError(http_error_msg, response=self)                                                                                                   
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.47/containers/5e287512107eb31936ba37073688
916026b3a861c248dac4dad0ebe42f7dd63b/start                                                                                                           
                                                                                                                                                     
The above exception was the direct cause of the following exception: 

I changed the sandbox_config.py according to the comment mentioned in a different issue thread. The following is the current version-

    # remote_runtime_api_url: str = Field(default='http://localhost:8000')
    remote_runtime_api_url: str | None = Field(default=None)

@SmartManoj
Copy link
Owner

SmartManoj commented Jan 26, 2025

First issue: another container may use the same port and already started and it checks the folder in another container.

Root cause: when running multiple workers, same port is being used.
Here, give unique port for each instance using a dictionary.

@BIJOY-SUST
Copy link

BIJOY-SUST commented Jan 26, 2025

I will try to provide custom ports in the code section you mentioned. Thank you.

Apart from this issue, I am facing another issue-
I am trying to do the inference on swe-bench-lite and experienced "too many open files" error-

Using openhands 0.21.0 version

raise ConnectionError(err, request=request)                                                                                         [45/1959]
requests.exceptions.ConnectionError: ('Connection aborted.', OSError(24, 'Too many open files'))                                                 
Exception ignored in atexit callback: <bound method DockerRuntime.close of <openhands.runtime.impl.docker.docker_runtime.DockerRuntime object at 
0x7ff4c5edaff0>>                                                                                                                                 
Traceback (most recent call last):                                                                                                               
  File "project/openhands/runtime/impl/docker/docker_runtime.py", line 371, in close                       
    stop_all_containers(close_prefix)                                                                                                            
  File "project/openhands/runtime/impl/docker/containers.py", line 7, in stop_all_containers               
    containers = docker_client.containers.list(all=True)                                                                                         
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                         
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/docker/models/containers.py", line 101
8, in list
    containers.append(self.get(r['Id']))
                      ^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/docker/models/containers.py", line 954
, in get
    resp = self.client.api.inspect_container(container_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/docker/utils/decorators.py", line 19, 
in wrapped
    return f(self, resource_id, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/docker/api/container.py", line 794, in
 inspect_container
    self._get(self._url("/containers/{0}/json", container)), True
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/docker/utils/decorators.py", line 44, 
in inner
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/docker/api/client.py", line 246, in _g
et
    return self.get(url, **self._set_request_timeout(kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/requests/sessions.py", line 602, in ge
t
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/requests/sessions.py", line 589, in re
quest
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/requests/sessions.py", line 703, in se
nd
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.cache/pypoetry/virtualenvs/openhands-ai-xxSbwZmD-py3.12/lib/python3.12/site-packages/requests/adapters.py", line 682, in se
nd
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', OSError(24, 'Too many open files'))
Exception ignored in atexit callback: <function stop_all_runtime_containers at 0x7ff5763272e0>
Traceback (most recent call last):

@SmartManoj
Copy link
Owner

@BIJOY-SUST
Copy link

BIJOY-SUST commented Jan 31, 2025

I’ve been using the 0.18.0 version since its release. However, I recently switched to version 0.21.0 and encountered an issue.

Instance scikit-learn__scikit-learn-25500 - 2025-01-25 01:59:42,199 - ERROR - ----------                                        
Error in instance [scikit-learn__scikit-learn-25500]: Failed to cd to /workspace/scikit-learn__scikit-learn__1.3: **CmdOutputObs
ervation (source=None, exit code=-1, metadata={                                                                                 
  "exit_code": -1,                                                                                                              
  "pid": -1,                                                                                                                    
  "username": null,                                                                                                             
  "hostname": null,                                                                                                             
  "working_dir": null,                                                                                                          
  "py_interpreter_path": null,                                                                                                  
  "prefix": "[Below is the output of the previous command.]\n",                                                                 
  "suffix": "\n[Your command \"cd /workspace/scikit-learn__scikit-learn__1.3\" is NOT executed. The previous command is still ru
nning - You CANNOT send new commands until the previous command is completed. By setting `is_input` to `true`, you can interact 
with the current process: You may wait longer to see additional output of the previous command by sending empty command '', send
 other commands to interact with the current process, or send keys (\"C-c\", \"C-z\", \"C-d\") to interrupt/kill the previous co
mmand before sending your new command.]"   
-------------------------------
-------------------------------
    raise EvalException(msg)                                                                                                    
evaluation.utils.shared.EvalException: Failed to cd to /workspace/scikit-learn__scikit-learn__1.3: **CmdOutputObservation (sourc
e=None, exit code=-1, metadata={   

Thanks for your earlier response. Could you please provide more details on how to address this Failed to cd to issue? The Failed to cd to error is occurring quite frequently.

@SmartManoj
Copy link
Owner

First issue: another container may use the same port and already started and it checks the folder in another container.

It's due to the port conflict.
Let's have two instances A and B.
If both uses the same port P, and if container B started before container A,
then the program A will check in container B and cd failed error will come.
Then later, if container A started, failed to bind error will occur.

Resolution for both issues: Unique port for each instance.

@SmartManoj
Copy link
Owner

SmartManoj commented Jan 31, 2025

Snippet to map each port.

from datasets import load_dataset
dataset = load_dataset(
                'princeton-nlp/SWE-bench_Lite',
                cache_dir='./cache',
                verification_mode='no_checks',
                num_proc=4,
                split='test',
            )

port_range = 63000
port_mapping = {}
for i in range(len(dataset)):
    port_mapping[dataset[i]['instance_id']] = port_range + i
print(port_mapping)

@SmartManoj
Copy link
Owner

Did you check about swebench_verified_mini?

SmartManoj added a commit that referenced this issue Jan 31, 2025
@SmartManoj
Copy link
Owner

Would you apply this commit and check if the issue resolves for you?

@BIJOY-SUST
Copy link

I’m applying this commit right now. I’ll keep you updated if the issue is resolved.

@BIJOY-SUST
Copy link

BIJOY-SUST commented Jan 31, 2025

I applied the commit, but it didn’t resolve the “failed to cd” issue. Currently, I have two issues:

  1. “failed to cd”: I encountered the same issue after applying the mentioned commit.
"exit_code": -1,                                                                                                                              
"pid": -1,                                                                                                                                    
"username": null,                                                                                                                             
"hostname": null,                                                                                                                             
"working_dir": null,                                                                                                                          
"py_interpreter_path": null,                                                                                                                  
"prefix": "[Below is the output of the previous command.]\n",                                                                                 
"suffix": "\n[Your command \"cd /workspace/django__django__3.2\" is NOT executed. The previous command is still running - You CANNOT send new 
commands until the previous command is completed. By setting `is_input` to `true`, you can interact with the current process: You may wait longe
r to see additional output of the previous command by sending empty command '', send other commands to interact with the current process, or sen
d keys (\"C-c\", \"C-z\", \"C-d\") to interrupt/kill the previous command before sending your new command.]"                                    
})**                                                                                                                                            
--BEGIN AGENT OBSERVATION--                                                                                                                     
[Below is the output of the previous command.]                                                                                                  
                                                                                                                                              
[Your command "cd /workspace/django__django__3.2" is NOT executed. The previous command is still running - You CANNOT send new commands until th
e previous command is completed. By setting `is_input` to `true`, you can interact with the current process: You may wait longer to see addition
al output of the previous command by sending empty command '', send other commands to interact with the current process, or send keys ("C-c", "C
-z", "C-d") to interrupt/kill the previous command before sending your new command.]                                                            
--END AGENT OBSERVATION--. Stacktrace:                                                                                                          
Traceback (most recent call last):                                                                                                              
File "/project/evaluation/utils/shared.py", line 309, in _process_instance_wrapper    
  result = process_instance_func(instance, metadata, use_mp, **kwargs)                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                        
File "/project/evaluation/benchmarks/swe_bench/run_infer.py", line 443, in process_ins
tance                                                                                                                                           
  return_val = complete_runtime(runtime, instance)                                                                                            
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                            
File "/project/evaluation/benchmarks/swe_bench/run_infer.py", line 332, in complete_ru
ntime                                                                                                                                           
  assert_and_raise(                                                                                                                           
File "/project/evaluation/utils/shared.py", line 286, in assert_and_raise             
  raise EvalException(msg)                                                                                                                    
evaluation.utils.shared.EvalException: Failed to cd to /workspace/django__django__3.2: **CmdOutputObservation (source=None, exit code=-1, metada
ta={                                                                                                                                            
"exit_code": -1,                                                                                                                              
"pid": -1,                                                                                                                                    
"username": null,                                                                                                                             
"hostname": null,                                                                                                                             
"working_dir": null,                                                                                                                          
"py_interpreter_path": null,                                                                                                                  
"prefix": "[Below is the output of the previous command.]\n",                                                                                 
"suffix": "\n[Your command \"cd /workspace/django__django__3.2\" is NOT executed. The previous command is still running - You CANNOT send new 
commands until the previous command is completed. By setting `is_input` to `true`, you can interact with the current process: You may wait longe
r to see additional output of the previous command by sending empty command '', send other commands to interact with the current process, or sen
d keys (\"C-c\", \"C-z\", \"C-d\") to interrupt/kill the previous command before sending your new command.]"                                    
})**                                                                                                                                            
--BEGIN AGENT OBSERVATION--
[Below is the output of the previous command.]
  1. I also encountered the following error: “container not found.” For more details, please refer to this link for the inference logs for this instance.
self._raise_for_status(response)
File "/project/.cache/pypoetry/virtualenvs/openhands-ai-PxnfiNA9-py3.12/lib/python3.12/site-packages/docker/api/client.py", line 277, in _raise_for_status
  raise create_api_error_from_http_exception(e) from e
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/project/.cache/pypoetry/virtualenvs/openhands-ai-PxnfiNA9-py3.12/lib/python3.12/site-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
  raise cls(e, response=response, explanation=explanation) from e
docker.errors.NotFound: 404 Client Error for http+docker://localhost/v1.47/containers/openhands-runtime-scikit-learn__scikit-learn-15512/json: Not Found ("No such container: openhands-runtime-scikit-learn__scikit-learn-15512")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/project/evaluation/utils/shared.py", line 309, in _process_instance_wrapper
  result = process_instance_func(instance, metadata, use_mp, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/project/evaluation/benchmarks/swe_bench/run_infer.py", line 418, in process_instance
  call_async_from_sync(runtime.connect)
File "/project/openhands/utils/async_utils.py", line 50, in call_async_from_sync
  result = future.result()
           ^^^^^^^^^^^^^^^
File "/project/.conda/envs/openhands_latest/lib/python3.12/concurrent/futures/_base.py", line 449, in result
  return self.__get_result()
         ^^^^^^^^^^^^^^^^^^^
File "/project/.conda/envs/openhands_latest/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
  raise self._exception
File "/project/.conda/envs/openhands_latest/lib/python3.12/concurrent/futures/thread.py", line 58, in run
  result = self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/project/openhands/utils/async_utils.py", line 44, in run
  return asyncio.run(arun())
         ^^^^^^^^^^^^^^^^^^^
File "/project/.conda/envs/openhands_latest/lib/python3.12/asyncio/runners.py", line 194, in run
  return runner.run(main)
         ^^^^^^^^^^^^^^^^
File "/project/.conda/envs/openhands_latest/lib/python3.12/asyncio/runners.py", line 118, in run
  return self._loop.run_until_complete(task)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/project/.conda/envs/openhands_latest/lib/python3.12/asyncio/base_events.py", line 664, in run_until_complete
  return future.result()
         ^^^^^^^^^^^^^^^
File "/project/openhands/utils/async_utils.py", line 37, in arun
  result = await coro
           ^^^^^^^^^^
File "/project/openhands/runtime/impl/docker/docker_runtime.py", line 135, in connect
  self.runtime_container_image = build_runtime_image(
                                 ^^^^^^^^^^^^^^^^^^^^
File "/project/openhands/runtime/utils/runtime_build.py", line 137, in build_runtime_image
  result = build_runtime_image_in_folder(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/project/openhands/runtime/utils/runtime_build.py", line 232, in build_runtime_image_in_folder
  _build_sandbox_image(
File "/project/openhands/runtime/utils/runtime_build.py", line 361, in _build_sandbox_image
  image_name = runtime_builder.build(
               ^^^^^^^^^^^^^^^^^^^^^^
File "/project/openhands/runtime/builder/docker.py", line 163, in build
  raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['docker', 'buildx', 'build', '--progress=plain', '--build-arg=OPENHANDS_RUNTIME_VERSION=0.21.0', '--build-arg=OPENHANDS_RUNTIME_BUILD_TIME=2025-01-31T14:54:32.083523', '--tag=ghcr.io/all-hands-ai/runtime:oh_v0.21.0_d8d8j73ho5y441j5_g7m4yn2837jngz2k', '--load', '--platform=linux/amd64', '/tmp/tmp4vda6inc']' returned non-zero exit status 1.

----------[The above error occurred. Retrying... (attempt 2 of 5)]----------

@SmartManoj
Copy link
Owner

SmartManoj commented Feb 1, 2025

openhands-runtime-scikit-learn__scikit-learn-15512

Now this is the new container name. Would you check the container logs?


Would you apply this commit to see why the buildx command failed?

@BIJOY-SUST
Copy link

  1. Do you have further details on how to address this Failed to cd to issue?

Now this is the new container name. Would you check the container logs?

-> There is no such container in this name in the container list.

I believe the buildx command failed because it couldn’t locate the container, as mentioned in the inference logs. Interestingly, when I rerun the inference process for that specific instance, it completed without any issues. However, the same issue persisted for different instances later on.

Would you apply this commit to see why the buildx command failed?

I applied this commit and reran the inference process.

@SmartManoj
Copy link
Owner

SmartManoj commented Feb 1, 2025

Now this is the new container name. Would you check the container logs?

  1. Now this is the new container name format. Would you check the container logs for the django?

@BIJOY-SUST
Copy link

There are two issues now and these issues occurred on multiple instances-

  1. Failed to cd

File "/project/evaluation/utils/shared.py", line 286, in assert_and_raise
raise EvalException(msg)
evaluation.utils.shared.EvalException: Failed to cd to /workspace/django__django__3.2: **CmdOutputObservation (source=None, exit code=-1, metada
ta={

  1. Not Found ("No such container...

docker.errors.NotFound: 404 Client Error for http+docker://localhost/v1.47/containers/openhands-runtime-scikit-learn__scikit-learn-14983/json: Not Found ("No such container: openhands-runtime-scikit-learn__scikit-learn-14983")

Running openhands again and will share the logs with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants