Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with "Completeness of Synthesized Generators" evaluation #33

Open
10 tasks
Pat-Lafon opened this issue Jan 19, 2024 · 0 comments
Open
10 tasks

Issues with "Completeness of Synthesized Generators" evaluation #33

Pat-Lafon opened this issue Jan 19, 2024 · 0 comments

Comments

@Pat-Lafon
Copy link
Contributor

Pat-Lafon commented Jan 19, 2024

Hey Zhe, in looking at the errors Poirot provides for ill-typed programs(to investigate localization of errors and recovering paths) I worked my way through some of the synthesize programs for section 6.3 of the evaluation.

In my opinion, there are some disappointing issues with this part of the evaluation though I understand this wasn't necessarily the focus of the paper. It's probably important to revisit this at least for the synthesis direction of current work.

I wasn't here when this was done, feel free to correct me where ever I am mistaken. I was running on the artifact branch of this repository with minor modifications on my end for debugging

  • Limited Structure: From the paper: "Since the space of these programs is potentially quite large (possibly infinite), we constrain the synthesizer to only generate programs with bounded function call depths; in our experiments, this bound was set to three.". In general, the corpus of programs used was disappointing. It's much more limited than the paper suggests with fewer components and only programs sharing identical structure. In particular, Figure 10 b is not represented. Your example of a sortedlist is also not represented and maybe not synthesizeable by Cobalt?
  • Duplicates: It's silly but some of the programs are duplicated and should be filtered out.
  • gt_eq_int_gen: I think you must of used a different specification for synthesis than what is currently used in the repository. It seems to just be int_gen with one extra step at the moment. This causes issues in a couple of places since it ends up violating: "The generator outputs all programs that are safe with respect to the specification.". However, you do seem to need this behavior to get some "complete" examples for both sizedlist and sortedlist. Are there somehow two specifications for this floating around?

I also maintained a list of example test cases for sizedlist and sorted list of programs that I thought were handled incorrectly for one reason or another.

  • SizedList: There are a bunch of obviously programs in terms of coverage like program 24 or 80 which uses some bug with singleton lists to get full coverage.
  • SizedList: Obviously wrong examples which get rejected for safety, many like 65/104 which creates double-sized lists. 10 which violates the recursion constraint.
  • SortedList: 2 or 11 which are just single element lists which get full coverage
  • SortedList: 3 which creates a list that doesn't use x0 at all which violates safety
  • SortedList: 9 passes the coverage check despite creating a sorted list of just the same element.
  • SortedList: 10 Might be a correct though disappointing solution except that I'm not sure what the correct spec of gt_eq_int_gen should be. With what my current expectation is, this should not pass.
  • SortedList: 12 is similar to previous cases except that the first element is probably too small no? This might be more of an issue with the spec of sortedlist instead.

As a general note, for atleast these two cases, I would say the false positive rate is high, over 50% at least.

It's interesting to think about timeouts here. The repository has this value set pretty high. Sometimes this helps with certain checks but in general this isn't going to work for synthesis. It's interesting to think about how changing that value my impact the synthesis algorithm.

@Pat-Lafon Pat-Lafon changed the title Issues with "Completeness of Synthesized Generators" evaulation Issues with "Completeness of Synthesized Generators" evaluation Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant