diff --git a/docs/README.md b/docs/README.md index 6d52914..802b901 100644 --- a/docs/README.md +++ b/docs/README.md @@ -17,4 +17,23 @@ To convert images to webp, run this from the project root: ```bash # convert docs/assets/images/examples python docs/scripts/convert_to_webp.py -d docs/assets/images/examples --height 400 +``` + +## Run + +The weblinx homepage is built with jekyll, which uses ruby. If you do not have ruby installed, please install it, then install bundler: +```bash +# Install ruby and gem +sudo apt-get install ruby-dev + +# install bundler +gem install bundler -v 2.4.22 +``` + +to run it locally, run this from the project root: + +```bash +cd docs +bundle install +bundle exec jekyll serve ``` \ No newline at end of file diff --git a/docs/_docs/weblinx.md b/docs/_docs/weblinx.md index aa334f9..ef188b4 100644 --- a/docs/_docs/weblinx.md +++ b/docs/_docs/weblinx.md @@ -19,7 +19,7 @@ Generate a __repr__ method for a class with the given attributes ### `Demonstration` ``` -weblinx.Demonstration(name, base_dir="./demonstrations", json_backend="auto") +weblinx.Demonstration(name, base_dir="./demonstrations", json_backend="auto", encoding=None) ``` #### Description @@ -33,6 +33,7 @@ weblinx.Demonstration(name, base_dir="./demonstrations", json_backend="auto") | `name` | `str` | | Name of the demonstration directory | | `base_dir` | `str` | `"./demonstrations"` | Base directory containing all demonstrations directories | | `json_backend` | `str` | `"auto"` | Backend to use to load JSON files. Can be either 'auto', 'json', 'orjson', or 'ujson'. For the 'auto' option, it will try to import 'orjson' first, then 'ujson', then 'json'. Both 'orjson' and 'ujson' are faster than 'json', but they must be installed first, via `pip install orjson` or `pip install ujson`. | +| `encoding` | `str` | `None` | Encoding to use when reading and writing files. If None, it will use the system's default encoding. | #### `Demonstration.__repr__` @@ -126,7 +127,7 @@ checking that all required files exist. #### `Demonstration.load_json` ``` -weblinx.Demonstration.load_json(self, filename, backend=None, default=None) +weblinx.Demonstration.load_json(self, filename, backend=None, default=None, encoding=None) ``` ##### Description @@ -141,6 +142,7 @@ Load a JSON file from the demonstration directory | `filename` | `str` | | Name of the file to load | | `backend` | `str` | `None` | Backend to use to load the JSON file. Can be either None, 'auto', 'json', 'orjson', or 'ujson'. If 'auto', it will try to import 'orjson' first, then 'ujson', then 'json'. If None, it will use the json_backend specified in the constructor. | | `default` | `Any` | `None` | Default value to return if the file does not exist. If None, it will raise an error. | +| `encoding` | `str` | `None` | Encoding to use when reading the file. If None, it will default to the Demonstration's encoding specified in the constructor, or the system's default encoding if it was not specified. | #### `Demonstration.save_json` @@ -254,7 +256,7 @@ html_path = demo.join('pages', 'page-1-0.html') ### `Turn` ``` -weblinx.Turn(turn_dict, index, demo_name, base_dir, json_backend="auto") +weblinx.Turn(turn_dict, index, demo_name, base_dir, json_backend="auto", encoding=None) ``` #### Description @@ -764,7 +766,7 @@ A string indicating the screenshot status ('good', 'broken', or None). #### `Turn.get_xpaths_dict` ``` -weblinx.Turn.get_xpaths_dict(self, uid_key="data-webtasks-id", cache_dir=None, allow_save=True, check_hash=False, parser="lxml", json_backend="auto") +weblinx.Turn.get_xpaths_dict(self, uid_key="data-webtasks-id", cache_dir=None, allow_save=True, check_hash=False, parser="lxml", json_backend="auto", encoding=None) ``` ##### Description @@ -785,6 +787,7 @@ before using cached data. | `check_hash` | `bool` | `False` | Whether to validate the HTML hash before using cached XPaths. | | `parser` | `str` | `"lxml"` | The parser backend to use for HTML parsing. Currently, only 'lxml' is supported. | | `json_backend` | `str` | `"auto"` | The backend to use for loading and saving JSON. If 'auto', chooses the best available option. | +| `encoding` | `str` | `None` | Encoding to use when reading the file. If None, it will default to the Demonstration's encoding specified in the constructor, or the system's default encoding if it was not specified. | ##### Returns @@ -798,7 +801,7 @@ A dictionary mapping unique IDs (from `uid_key`) to their corresponding XPaths i ### `Replay` ``` -weblinx.Replay(replay_json, demo_name, base_dir) +weblinx.Replay(replay_json, demo_name, base_dir, encoding=None) ``` #### Description @@ -813,6 +816,7 @@ Represents a replay of a demonstration, encapsulating a sequence of turns (actio | `replay_json` | `dict` | | The JSON object containing the replay data. | | `demo_name` | `str` | | The name of the demonstration this replay belongs to. | | `base_dir` | `str` | | The base directory where the demonstration data is stored. | +| `encoding` | `str` | `None` | The encoding to use when reading files. If None, it will default to the system's default encoding. | #### `Replay.__getitem__` diff --git a/docs/_docs/weblinx/processing.md b/docs/_docs/weblinx/processing.md index 82eaf13..3b0bbdd 100644 --- a/docs/_docs/weblinx/processing.md +++ b/docs/_docs/weblinx/processing.md @@ -619,7 +619,7 @@ The utterances formatted as a string, or as a list if sep is None. ### `format_prev_turns_truncated` ``` -weblinx.processing.prompt.format_prev_turns_truncated(replay, turn, format_intent, tokenizer, num_tokens_to_remove, format_output_dict_fn=<_ast.Name object at 0x7feace0b3f40>, num_prev_turns=5, turn_sep=" ; ", allow_iterative_reduction=False) +weblinx.processing.prompt.format_prev_turns_truncated(replay, turn, format_intent, tokenizer, num_tokens_to_remove, format_output_dict_fn=<_ast.Name object at 0x7fac2032a370>, num_prev_turns=5, turn_sep=" ; ", allow_iterative_reduction=False) ``` #### Description @@ -648,7 +648,7 @@ This output of this function should be used by format_utterances to display the ### `multi_attempt_format_prev_turns_truncated` ``` -weblinx.processing.prompt.multi_attempt_format_prev_turns_truncated(replay, turn, format_intent, tokenizer, max_tokens, num_prev_turns=5, turn_sep=" ; ", max_attempts=5, format_output_dict_fn=<_ast.Name object at 0x7feace024dc0>, warn_after_attempts=True, allow_iterative_reduction=False) +weblinx.processing.prompt.multi_attempt_format_prev_turns_truncated(replay, turn, format_intent, tokenizer, max_tokens, num_prev_turns=5, turn_sep=" ; ", max_attempts=5, format_output_dict_fn=<_ast.Name object at 0x7fac2034a2b0>, warn_after_attempts=True, allow_iterative_reduction=False) ``` #### Description @@ -737,7 +737,7 @@ The top candidates for the given turn. ### `select_turns_and_candidates_for_prompts` ``` -weblinx.processing.prompt.select_turns_and_candidates_for_prompts(demos, candidates=None, num_candidates=20) +weblinx.processing.prompt.select_turns_and_candidates_for_prompts(demos, candidates=None, num_candidates=20, remove_turns_without_elements=True) ``` #### Description @@ -756,6 +756,7 @@ for a given turn, then we will find the previous turn that has candidates. | `demos` | `list` | | The list of demonstrations to select the turns from. | | `candidates` | `dict` | `None` | The candidates for all turns, as a dictionary of lists. If None, then the candidates will not be used. Defaults to None. | | `num_candidates` | `int` | `20` | The number of candidates to select for each turn. Defaults to 20. | +| `remove_turns_without_elements` | `bool` | `True` | Whether to remove turns that do not have elements. Defaults to True. | #### Returns diff --git a/docs/_docs/weblinx/utils.md b/docs/_docs/weblinx/utils.md index 92e00be..46cd6b3 100644 --- a/docs/_docs/weblinx/utils.md +++ b/docs/_docs/weblinx/utils.md @@ -125,13 +125,13 @@ not None, then this will be a random sample of the specified size. ### `auto_read_json` ``` -weblinx.utils.auto_read_json(path, backend="auto") +weblinx.utils.auto_read_json(path, backend="auto", encoding=None) ``` ### `auto_save_json` ``` -weblinx.utils.auto_save_json(data, path, backend="auto", indent=None) +weblinx.utils.auto_save_json(data, path, backend="auto", indent=0) ``` ### `save_results` @@ -560,7 +560,7 @@ A string or dictionary representing the timestamp. ### `format_change` ``` -weblinx.utils.format.format_change(turn, formatters=(<_ast.Name object at 0x7feacdfde4c0>, <_ast.Name object at 0x7feacdfde5b0>, <_ast.Name object at 0x7feacdfde5e0>), return_as="dict") +weblinx.utils.format.format_change(turn, formatters=(<_ast.Name object at 0x7fac2032da60>, <_ast.Name object at 0x7fac2032db50>, <_ast.Name object at 0x7fac2032db80>), return_as="dict") ``` #### Description @@ -577,7 +577,7 @@ when the input is changed, for example, in an input, select or textarea. | Name | Type | Default | Description | | ---- | ---- | ------- | ----------- | | `turn` | `Turn` | | The turn object to be represented as either a string or a dictionary. | -| `formatters` | `` | `(<_ast.Name object at 0x7feacdfde4c0>, <_ast.Name object at 0x7feacdfde5b0>, <_ast.Name object at 0x7feacdfde5e0>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | +| `formatters` | `` | `(<_ast.Name object at 0x7fac2032da60>, <_ast.Name object at 0x7fac2032db50>, <_ast.Name object at 0x7fac2032db80>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | | `return_as` | `str` | `"dict"` | Whether to return the formatted element as a string or a dictionary. | @@ -592,7 +592,7 @@ A string or dictionary representing the turn. ### `format_click` ``` -weblinx.utils.format.format_click(turn, formatters=(<_ast.Name object at 0x7feacdfdecd0>, <_ast.Name object at 0x7feacdfdec70>, <_ast.Name object at 0x7feace01d040>), return_as="dict") +weblinx.utils.format.format_click(turn, formatters=(<_ast.Name object at 0x7fac2032a610>, <_ast.Name object at 0x7fac2032a520>, <_ast.Name object at 0x7fac2032a670>), return_as="dict") ``` #### Description @@ -605,7 +605,7 @@ Format a turn with intent click into a readable format. | Name | Type | Default | Description | | ---- | ---- | ------- | ----------- | | `turn` | `Turn` | | The turn object to be represented as either a string or a dictionary. | -| `formatters` | `` | `(<_ast.Name object at 0x7feacdfdecd0>, <_ast.Name object at 0x7feacdfdec70>, <_ast.Name object at 0x7feace01d040>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | +| `formatters` | `` | `(<_ast.Name object at 0x7fac2032a610>, <_ast.Name object at 0x7fac2032a520>, <_ast.Name object at 0x7fac2032a670>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | | `return_as` | `str` | `"dict"` | Whether to return the formatted element as a string or a dictionary. | @@ -678,7 +678,7 @@ A string or dictionary representing the turn. ### `format_hover` ``` -weblinx.utils.format.format_hover(turn, formatters=(<_ast.Name object at 0x7feace004a90>, <_ast.Name object at 0x7feace004ac0>, <_ast.Name object at 0x7feace004af0>), return_as="dict") +weblinx.utils.format.format_hover(turn, formatters=(<_ast.Name object at 0x7fac20343040>, <_ast.Name object at 0x7fac20343070>, <_ast.Name object at 0x7fac203430a0>), return_as="dict") ``` #### Description @@ -691,7 +691,7 @@ This behaves similarly to format_click, but for hover events. | Name | Type | Default | Description | | ---- | ---- | ------- | ----------- | | `turn` | `Turn` | | The turn object to be represented as either a string or a dictionary. | -| `formatters` | `list or tuple` | `(<_ast.Name object at 0x7feace004a90>, <_ast.Name object at 0x7feace004ac0>, <_ast.Name object at 0x7feace004af0>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | +| `formatters` | `list or tuple` | `(<_ast.Name object at 0x7fac20343040>, <_ast.Name object at 0x7fac20343070>, <_ast.Name object at 0x7fac203430a0>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | | `return_as` | `str` | `"dict"` | Whether to return the formatted element as a string or a dictionary. | @@ -807,7 +807,7 @@ Similar to format_mouse_xy, but for scroll events. ### `format_submit` ``` -weblinx.utils.format.format_submit(turn, formatters=(<_ast.Name object at 0x7feace0245b0>, <_ast.Name object at 0x7feace0245e0>), return_as="dict") +weblinx.utils.format.format_submit(turn, formatters=(<_ast.Name object at 0x7fac20160b20>, <_ast.Name object at 0x7fac20160b50>), return_as="dict") ``` #### Description @@ -820,7 +820,7 @@ Format a turn with intent submit into a readable format. | Name | Type | Default | Description | | ---- | ---- | ------- | ----------- | | `turn` | `Turn` | | The turn object to be represented as either a string or a dictionary. | -| `formatters` | `` | `(<_ast.Name object at 0x7feace0245b0>, <_ast.Name object at 0x7feace0245e0>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | +| `formatters` | `` | `(<_ast.Name object at 0x7fac20160b20>, <_ast.Name object at 0x7fac20160b50>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | | `return_as` | `str` | `"dict"` | Whether to return the formatted element as a string or a dictionary. | @@ -837,7 +837,7 @@ Format a turn with intent tabcreate, tabremove or tabswitch into a readable form ### `format_text_input` ``` -weblinx.utils.format.format_text_input(turn, formatters=(<_ast.Name object at 0x7feace033ca0>, <_ast.Name object at 0x7feace033df0>, <_ast.Name object at 0x7feace033f10>), return_as="dict") +weblinx.utils.format.format_text_input(turn, formatters=(<_ast.Name object at 0x7fac20152220>, <_ast.Name object at 0x7fac20152340>, <_ast.Name object at 0x7fac20152490>), return_as="dict") ``` #### Description @@ -854,7 +854,7 @@ is changed. | Name | Type | Default | Description | | ---- | ---- | ------- | ----------- | | `turn` | `Turn` | | The turn object to be represented as either a string or a dictionary. | -| `formatters` | `` | `(<_ast.Name object at 0x7feace033ca0>, <_ast.Name object at 0x7feace033df0>, <_ast.Name object at 0x7feace033f10>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | +| `formatters` | `` | `(<_ast.Name object at 0x7fac20152220>, <_ast.Name object at 0x7fac20152340>, <_ast.Name object at 0x7fac20152490>)` | A tuple of functions to be used to format the turn. The functions will be called in order, and each function should return a dictionary (not a string), which will be merged into the final output. The functions should take the turn as the first argument, and return a dictionary. | | `return_as` | `str` | `"dict"` | Whether to return the formatted element as a string or a dictionary. | @@ -869,7 +869,7 @@ A string or dictionary representing the turn. ### `format_intent_automatically` ``` -weblinx.utils.format.format_intent_automatically(turn, format_change=<_ast.Name object at 0x7feace039c40>, format_click=<_ast.Name object at 0x7feace039c70>, format_copy=<_ast.Name object at 0x7feace039ca0>, format_hover=<_ast.Name object at 0x7feace039cd0>, format_load=<_ast.Name object at 0x7feace039d00>, format_paste=<_ast.Name object at 0x7feace039d30>, format_say=<_ast.Name object at 0x7feace039d60>, format_scroll=<_ast.Name object at 0x7feace039d90>, format_submit=<_ast.Name object at 0x7feace039dc0>, format_tab=<_ast.Name object at 0x7feace039df0>, format_text_input=<_ast.Name object at 0x7feace039e20>, return_as="dict") +weblinx.utils.format.format_intent_automatically(turn, format_change=<_ast.Name object at 0x7fac201621f0>, format_click=<_ast.Name object at 0x7fac20162220>, format_copy=<_ast.Name object at 0x7fac20162250>, format_hover=<_ast.Name object at 0x7fac20162280>, format_load=<_ast.Name object at 0x7fac201622b0>, format_paste=<_ast.Name object at 0x7fac201622e0>, format_say=<_ast.Name object at 0x7fac20162310>, format_scroll=<_ast.Name object at 0x7fac20162340>, format_submit=<_ast.Name object at 0x7fac20162370>, format_tab=<_ast.Name object at 0x7fac201623a0>, format_text_input=<_ast.Name object at 0x7fac201623d0>, return_as="dict") ``` #### Description