h2oGPT 0.2.0 Release
Official Release for h2oGPT 0.2.0
What's Changed
- Add code to push spaces chatbot by @pseudotensor in #46
- Fixes #48 by @pseudotensor in #55
- More HF spaces restrictions to prevent OOM or no-good choices being chosen by @pseudotensor in #57
- Add max_beams to client_test.py by @lo5 in #64
- Fix directory name from h2o-llm to h2ogpt on install tutorial by @cpatrickalves in #63
- h2o theme for background by @jefffohl in #68
- Add option to save prompt and response as .json. by @arnocandel in #69
- Update tos.md by @eltociear in #70
- Use SAVE_DIR and --save_dir instead of SAVE_PATH and --save_path. by @arnocandel in #71
- Make chat optional from UI/client by @pseudotensor in #74
- Compare models by @pseudotensor in #42
- H2O gradio theme by @jefffohl in #84
- Refactor gradio into separate file and isolate it from torch specific stuff by @pseudotensor in #85
- Refactor finetune so some of it can be used to check data and its tokenization by @pseudotensor in #93
- Llama flash attn by @arnocandel in #86
- Give default context to help chatbot by @pseudotensor in #100
- CUDA mismatch work-around for no gradio case by @pseudotensor in #101
- Add Triton deployment template. by @arnocandel in #91
- Check data for unhelpful responses by @pseudotensor in #103
- Clear torch cache memory every 20s by @pseudotensor in #90
- Try transformers experimental streaming. Still uses threads, so probably won't fix browser exit GPU memory issue by @pseudotensor in #98
- Handle thread stream generate exceptions. by @pseudotensor in #110
- Specify chat separator by @pseudotensor in #114
- [DOCS] README typo fix and readability improvements by @zainhaq-h2o in #118
- Support OpenAssistant models in basic form, including 30B xor one by @pseudotensor in #119
- Add stopping condition to pipeline case by @pseudotensor in #120
- Allow auth control from CLI by @pseudotensor in #123
- Improve data prep by @arnocandel in #122
- [DOCS] Grammar / readability improvements for FAQ.md by @zainhaq-h2o in #124
- neox Flash attn by @arnocandel in #31
- Langchain integration by @pseudotensor in #111
- Allow CLI add to db and clean-up handling of evaluate args by @pseudotensor in #137
- Add zip upload and parallel doc processing by @pseudotensor in #138
- Control visibility of buttons, but still gradio issues mean can't spin/block button while processing in background by @pseudotensor in #140
- Add URL support by @pseudotensor in #142
- HTML, DOCX, and better markdown support by @pseudotensor in #143
- odt, pptx, epub, UI text paste, eml support (both text/html and text/plain) and refactor so glob simpler by @pseudotensor in #144
- Reform chatbot client API code by @pseudotensor in #117
- Add import control check to avoid leaking optional langchain stuff into generate/gradio. Add test by @pseudotensor in #146
- [DevOps] Snyk Integration by @ChathurindaRanasinghe in #131
- Add image support and show sources after upload by @pseudotensor in #147
- Update finetune.py by @orellavie1212 in #132
- ArXiv support via URL in chatbot UI by @pseudotensor in #152
- Improve caption, include blip2 as option by @pseudotensor in #153
- Control chats, save, export, import and otherwise manage by @pseudotensor in #156
- Mac/Windows install and GPT4All as base model for pure CPU mode support by @pseudotensor in #157
- Move loaders out of finetune, which is only for training, while loader used for generation too by @pseudotensor in #161
- Allow selection of subset of docs in collection for query by @pseudotensor in #163
- Improve datasource layout by @pseudotensor in #164
- Refactor run_qa_db a bit, so can do other tasks by @pseudotensor in #167
- Use latest peft/transformers/accelerate/bitsandbytes for 4-bit (qlora) by @arnocandel in #166
- Refactor out run_eval out of generate.py by @pseudotensor in #173
- Add CLI mode with tests by @pseudotensor in #174
- Separate out FAISS from requirements by @pseudotensor in #184
- Generalize h2oai_pipeline so works for any instruct model we have prompt_type for, so run_db_qa will stream and stop just like non-db code path by @pseudotensor in #190
- Ensure can use offline by @pseudotensor in #191
- Fix and test llamacpp by @pseudotensor in #197
- Improve use of ctx vs. max_new_tokens for non-HF models, and if no docs, don't insert == since no docs, just confuses model by @pseudotensor in #199
- UI help in FAQ by @pseudotensor in #205
- Quantized model updates, switch to recommending TheBloke by @pseudotensor in #208
- Fix nochat API by @pseudotensor in #209
- Move docs and optional reqs to directories by @pseudotensor in #214
- Allow for custom eval json file by @pseudotensor in #227
- Fix run_eval and validate parameters are all passed by @pseudotensor in #228
- Add setup.py wheel building option by @pseudotensor in #229
- [DevOps] Fix condition issue for snyk test & snyk monitor by @ChathurindaRanasinghe in #169
- Add weaviate support by @hsm207 in #218
- More weaviate tests by @pseudotensor in #231
- Allow add to db when loading from generate by @pseudotensor in #212
- Allow update db from UI if files changed, since normally not constantly checking for new files by @pseudotensor in #232
- More control over max_max_new_tokens and memory behavior from generate args by @pseudotensor in #234
- Make API easier, and add prompt_dict for custom control over prompt as example of new API parameter don't need to pass by @pseudotensor in #238
- Chunk improve by @pseudotensor in #239
- Fix
TypeError: can only concatenate str (not "list") to str
on startup by @this in #242 - Fix nochat in UI so enter works to submit again, and if langchain mode used then shows HTML links for sources by @pseudotensor in #244
- Improve subset words and code by @pseudotensor in #245
- use instructor embedding, and add migration of embeddings if ever changes, at least for chroma by @pseudotensor in #247
- Add extra clear torch cache calls so embedding on GPU doesn't stick to GPU by @pseudotensor in #252
- Fixes #249 by @pseudotensor in #255
- Support connecting to a local weaviate instance by @hsm207 in #236
- .gitignore updated for .idea and venv by @fazpu in #256
- move enums and add test for export copy since keep changing what files have what structures by @pseudotensor in #260
- Ensure generate hyperparameters are passed through to h2oai_pipelinepy for generation by @pseudotensor in #265
- Submit button is now primary + more spacing between prompt area and action buttons by @fazpu in #261
- input prompt - primary color border added + change in label text by @fazpu in #259
- prompt form moved to a separate file by @fazpu in #258
- Upgrade gradio by @pseudotensor in #269
- Fixes #270 by @pseudotensor in #272
- A couple of small updates to the documentation by @3x0dv5 in #274
- Add documentation on how to connect to weaviate by @hsm207 in #267
- Update Weaviate and FAISS a bit to be closer to Chroma in h2oGPT with limitations. Add testing. by @pseudotensor in #275
- Escape so it outputs
$LD_LIBRARY_PATH:/usr/local/cuda/lib64/
by @3x0dv5 in #276 - Add h2oGPT Client by @this in #133
- Update requirements and add code to get latest versions by @pseudotensor in #281
- Pass actual eos id to generate, else doesn't know how to stop early if using non-standard eos id (normally=0, falcon GM was 11) by @pseudotensor in #286
- Fixes #291 -- make user_path if doesn't exist but passed, and move gradio temp file to user_path if passed to generate. by @pseudotensor in #292
- Add QUIP et al. metrics for context-Q-A testing by @pseudotensor in #293
- Add ci support + wheel by @achraf-mer in #243
- Option to fill-up context if top_k_docs=-1 by @pseudotensor in #294
- Update README.md by @arnocandel in #296
- Fixes #279 by @pseudotensor in #297
- Update README.md by @eltociear in #301
- Add support for text-generation-server, gradio inference server, OpenAI inference server. by @pseudotensor in #295
- Use latest peft, to fix export failure. by @arnocandel in #310
- Model N by @pseudotensor in #313
- Control maxtime for TGI by @pseudotensor in #322
- Fix prompting by @pseudotensor in #327
- Protect evaluate against bad inputs by @pseudotensor in #326
- Add explicit cast to bool for visible=kwarg[...] by @parkeraddison in #328
- For multi-model ChatAll view, save all models together so can recover together by @pseudotensor in #331
- Fixes #333 by quickly checking if can reach endpoint using requests by @pseudotensor in #334
- ChatAll: stream as fast as one can with short timeout=0.01 to avoid stalling any single endpoint to appear in UI by @pseudotensor in #335
- Handle exceptions better when doing multi-view model lock, and don't block good endpoints by @pseudotensor in #337
- Queue input to avoid fresh submit using input_list at click/enter time, else truncates result because it uses input_list from time of click by @pseudotensor in #341
- Cleanup gradio UI a bit, ask at top so not lost at bottom by @pseudotensor in #344
- Log extra information, and fix max_max_new_tokens by @pseudotensor in #347
- Fix prompting for gradio->gradio by @pseudotensor in #348
- Reset client hash every client call, and reset client state if server changes for when want stateless client by @pseudotensor in #351
- If have HF model/tokenizer, use that instead of faketokenizer (tiktoken) since see too large differences and failures even with 250 token buffer, still of by another 350. by @pseudotensor in #352
- If only can add to MyData, automatically add without having to click button by @pseudotensor in #353
- Get number of tokens after limited, although before prompt_type is applied, to reduce max_new_tokens for OpenAI by @pseudotensor in #366
- Typo: One some systems -> On some systems by @ernstvanderlinden in #377
- 8bit mode command fix FAQ.md by @0xraks in #375
- Fixes #368 by rotating session has for streaming too. by @pseudotensor in #384
- Add a docker runtime, to be used to run h2o gpt models by @achraf-mer in #381
- Update README_LangChain.md by @cimadure in #387
- Fixes #382 for offloading llama to GPU using llama.cpp. by @pseudotensor in #393
- Fix nochat by @pseudotensor in #394
- Client revamp by @this in #349
- Update README_CPU.md by @wienke in #402
- Add summarization action by @pseudotensor in #365
- Move files to src by @pseudotensor in #406
- Add test to the Client to validate parameters order with h2ogpt by @this in #392
- Tweaks for MAC M1 by @Mathanraj-Sharma in #408
- Add AutoGPTQ -- Fixes #263 and Fixes #339 and Fixes #417 by @pseudotensor in #416
- Fix prompt answer after broken after vicuna addition by @pseudotensor in #422
- Improve UI and UX -- Fixes #285 by @pseudotensor in #429
- FAQ.md - make the information about hugging face models stand out + fix for the prompter link by @fazpu in #435
- [DOCS] Fix typos and improve readability (FAQ page) by @zainhaq-h2o in #441
- Autoset langchain_mode if not passed by @pseudotensor in #436
- Fix attribute error for NoneType - Python 3.9 by @Mathanraj-Sharma in #445
- Add vLLM support -- Fixes #312 by @pseudotensor in #454
- add more deps to docker by @achraf-mer in #452
- Update Windows bitsandbytes wheel link by @jllllll in #458
- Update docs for llama metal support by @Mathanraj-Sharma in #470
- [DevOps] Update README for docker runtime image consumption by @ChathurindaRanasinghe in #477
- Fix typo in timeout_iterator.py by @eltociear in #479
- Add E2E test for fine-tuning/export/tgi, exposed issue with TGI 0.9.1, works in 0.8.2/0.9.2 by @arnocandel in #424
- Use latest bitsandbytes and accelerate. by @arnocandel in #485
- Revert "Use latest bitsandbytes and accelerate." by @pseudotensor in #488
- Update main readme for docker runtime consumption by @ChathurindaRanasinghe in #489
- Package more modules to python wheel by @achraf-mer in #492
- Add note to install torch 2.1 for MPS by @Mathanraj-Sharma in #491
- UI is spread to the full width by @fazpu in #495
- add prompt template for llama2 by @arnocandel in #494
- Fixes #398 -- Custom collection/db and user_path, persisted to disk f… by @pseudotensor in #476
- Remove system prompt for llama2, too guarded. by @arnocandel in #506
- Minor Shell Script Changes by @slycordinator in #487
- Add tagging docker runtime with semver by @ChathurindaRanasinghe in #500
- Update readme for semver docker runtime image by @ChathurindaRanasinghe in #511
- Fixes #446 Control chat history being added to context for langchain or not by @pseudotensor in #507
- LLaMa2 with AutoGPTQ and 16-bit mode with RoPE scaling by @pseudotensor in #517
- Fixes #514 Fix llama2 prompting by @pseudotensor in #523
- Support exllama by @pseudotensor in #526
- Exclude unnecessary files & directories from wheel by @ChathurindaRanasinghe in #537
- Improve docs by @pseudotensor in #542
- feat: expose 'root_path' from gradio by @jcatana in #547
- Add long context tests and fix tokenizer truncation for activated rope_scaling by @arnocandel in #524
- Set max_seq_len outside config, config can't always set due to protections in class by @pseudotensor in #554
- Add latest tag to docker runtime by @ChathurindaRanasinghe in #560
- Load args from env vars as long as var starts with H2OGPT_ by @achraf-mer in #556
- Isolate n_gpus base handling for CUDA from MPS by @Mathanraj-Sharma in #563
- Efficient parallel summarization and use full docs, not vectordb chunks. by @pseudotensor in #551
- Unblock streaming for multi-stream case by @pseudotensor in #577
- Use docker entrypoint args instead of custom entry point by @achraf-mer in #521
- Minor docs update by @achraf-mer in #591
- Control embedding migration by @pseudotensor in #585
- Fix client's ValueError: An event handler (fn) didn't receive enough input values (needed: 34, got: 32). by @pseudotensor in #600
- Upgrade gradio-client version to v0.3.0 in the Client by @this in #601
- Test docker of TGI + h2oGPT dockers by @pseudotensor in #602
- docs: Add GCR link to the Docker README by @ChathurindaRanasinghe in #604
- Better handling of pdfs if broken by @pseudotensor in #608
- Add replicate support, Fixes #603 by @pseudotensor in #606
- Document how to disable chroma telemetry by @mmalohlava in #543
- Use
/submit_nochat_api
for the Text Completion API in the Client by @this in #609 - Allow server to save history.json with requests headers by @pseudotensor in #613
- Fine tune llama2 by @arnocandel in #574
- [Client] Parse the return value from
/submit_nochat_api
to extract the response by @this in #627 - Handle persistence of user states for personal/scratch spaces. by @pseudotensor in #618
- Windows installer by @pseudotensor in #647
- Ensure meta data in response and Fixes #649 and upgrade gradio by @pseudotensor in #653
- Fix docker permissions and allow using a non root user by @achraf-mer in #664
- some fixes for docker run by @zba in #659
- Minor grammatical changes by @anfrd in #663
- mac install readme updated - the tessaract command by @fazpu in #665
- the message copy button moved closer to top border and message padding increased to 16px by @fazpu in #666
- Azure OpenAI by @pseudotensor in #667
- the copy button is placed at the bottom of each message by @fazpu in #668
- Fixes for conda installation issues by @ChathurindaRanasinghe in #681
- Improve bitsandbytes usage and control in UI by @pseudotensor in #682
- Add pandas as a direct dependency for vLLM by @ChathurindaRanasinghe in #686
- GitHub Action Workflow to Publish Python Package by @ChathurindaRanasinghe in #670
- Fixes #678 and Fixes #451 and Fixes #434 by @pseudotensor in #690
- Adding install Git command to windows installation instructions by @ceriseghost in #698
- Fixes #709 -- improve in-context learning control by @pseudotensor in #720
- add build id as a docker tag (will make it easier to trace in CI history) by @achraf-mer in #719
- Improve offline caching by @achraf-mer in #715
- Make sure cache directory is consistent, and is pointing to /workspace/.cache by @achraf-mer in #716
- Add performance benchmarks. by @arnocandel in #648
- Better control over prompting for document Q/A by @pseudotensor in #721
- fix: Modify JavaScript code generation to be compatible with Gradio Blocks. by @mmalohlava in #696
- explicitly set additional cache directories to be under ~/.cache (or /workspace/.cache) by @achraf-mer in #728
- change doc to run with local host user, so local host cache can be reused. by @achraf-mer in #730
- More documentation updates by @achraf-mer in #732
- Add vLLM in docker by @ChathurindaRanasinghe in #714
- Fix llama2 by @arnocandel in #747
- Fix Llama2 7B fine-tuning by @arnocandel in #644
- Add ability to control quality-effort of ingestion/parsing and add support for json, jsonl, gzip by @pseudotensor in #737
- Fix make_db.py from docker and document in readme by @achraf-mer in #750
- [Docs] Change docker image name in vllm by @ChathurindaRanasinghe in #753
- adding first draft of doctr integration by @ryanchesler in #752
- Rc/#762 fixes file upload hanging on UI by @ryanchesler in #765
- Added prompter entries for lmsys/vicuna-7b-v1.5, lmsys/vicuna-13-v1.5… by @patrickhwood in #756
- don't set envs, just keep the defaults from HOME env var by @achraf-mer in #733
- [DOCS] Fix the link to offline README by @muendelezaji in #778
- Added the option to create OCRed documents that are layout aware by @ryanchesler in #779
- h2oGPT Helm Chart by @EshamAaqib in #770
- doc(macos): pin
llama-cpp-python
version for support GGML by @iam4x in #780 - Fixes #703 Bugfix: Broken multilanguage output by @Mins0o in #790
- added pix2struct by @ryanchesler in #792
- Fixes #508 by @pseudotensor in #805
- attach button added to the prompt form by @fazpu in #674
- DocTR handling of pdfs by @ryanchesler in #787
- Improve docker layer caching to reduce overall image size by @achraf-mer in #803
- Add prompt type for Falcon-180B(-chat) by @arnocandel in #806
- [DevOps] Packer scripts for Azure, GCP & Jenkins pipeline by @ChathurindaRanasinghe in #788
- Add softlink to preserve compatibility with old commands from docs and readme(s) by @achraf-mer in #808
- consolidate install script in one place, speed up build, + fix caching for TGI and vLLM by @achraf-mer in #813
- Rebuild duckdb with control over threads to avoid excessive threads per db when system has large core count by @pseudotensor in #810
- enable_pdf_doctr in utils by @ffalkenberg in #819
- Allow choose model from UI and client via model_active_choice option when using model_lock by @pseudotensor in #820
- Merge nochat API model_active_choice with visible_models by @pseudotensor in #823
- landing screens components re-ordered on mobile screens by @fazpu in #827
- header styling changed on mobile screen by @fazpu in #829
- prompt area and upload button adjusted for mobile screens by @fazpu in #830
- visible models don't have the remove-all button by @fazpu in #831
- Build duckdb using manylinux by @achraf-mer in #834
- Bump helm chart build to 85. by @tomkraljevic in #833
- Fix build tag by @achraf-mer in #836
- Update to new chroma to fix DB corruption issues by @pseudotensor in #837
- labels are brighter by @fazpu in #818
- app styling updated by @fazpu in #856
- Keyed access by @pseudotensor in #850
- css cleanup - two unused css id definitions removed by @fazpu in #852
- dark theme - secondary button styling improved, label background color a bit lighter by @fazpu in #857
- helm chart improvements by @achraf-mer in #825
- Simplify system_prompt, no more separate use_system_prompt, and ensur…e pass through that to all models that take system prompt, e.g. openai, replicate if supported, llama2, beluga, falcon180 by @pseudotensor in #867
- Allow pre-appending chat conversation by @pseudotensor in #869
- Chore: Add printing Makefile variables by @ChathurindaRanasinghe in #872
- Fixes #873 by @pseudotensor in #874
- Better prepare offline docs and code by @pseudotensor in #877
- Add text_context_list to directly pass text lists to LLM to avoid db etc. steps if don't care about persisting state and just want LLM to use context as if uploaded docs by @pseudotensor in #879
- Fix locking by @pseudotensor in #883
- Account for prompt when counting tokens in prompt template by @pseudotensor in #891
- Remove extra wget by @lamw in #892
- Move the
h2ogpt_key
param to the constructor of theClient
by @this in #899 - Allow rw to /workspace data by @achraf-mer in #902
- more cleanup to docker build scripts by @achraf-mer in #903
- Web search and Agents by @pseudotensor in #858
- configure update strategy by @lweren in #909
- Standardizes
--llamacpp_dict
usage in docs by @jamesbraza in #845 - [DevOps] Build wheel after modifying version in workflow by @ChathurindaRanasinghe in #921
- fix volume mounts by @achraf-mer in #919
- Fix handling of chat_conversation+system prompt using doing langchain by @pseudotensor in #920
- Bump Helm Version by @EshamAaqib in #824
- Fix OpenAI summarization and use of text_context_list prompting and simplify code by @pseudotensor in #924
- Speed-up sim search if only doing chunk_id filter. Speed-up other various tasks if large db. by @pseudotensor in #929
- Update docs for MACOS MPS by @Mathanraj-Sharma in #911
- Update Windows Installer files for October 2023 by @pseudotensor in #930
- Add docker compose for vllm and when running on CPU mode by @achraf-mer in #927
- add extra env variables by @lweren in #940
- Hk/main/benchmark plots by @hemenkapadia in #941
- Improve summarization and add extraction -- speed-up streaming by @pseudotensor in #935
- External LLM Support - Helm by @EshamAaqib in #944
- Improve airgapped cache by @achraf-mer in #952
- Add AWQ by @pseudotensor in #954
- Bump helm chart version by @EshamAaqib in #957
- Update README_ui.md by @squidwardthetentacles in #925
- Improve timeout via max_time in UI/API by @pseudotensor in #958
- [DOCS] Improve FAQ readability by @zainhaq-h2o in #959
- Ensure clone takes into account client inside endpoints. Persist client typically unless can't or don't request, since always using clone now. by @pseudotensor in #966
- [DOCS] Improve readability of README (second edit) by @zainhaq-h2o in #968
- [DOCS] Improve readability of INSTALL.md by @zainhaq-h2o in #971
- Add attention_sinks support for arbitrarily long generation by @pseudotensor in #973
- Add gputil python package by @tomkraljevic in #982
- Fixed typo gpu_mem_track.py modelling_RW_falcon40b.py modelling_RW_falcon7b.py by @AniketP04 in #981
- Fixed typo timeout_iterator.py by @AniketP04 in #988
- Catch exception if not quite in job.future._exception and raise up to gradio for adding to chat exceptions in UI or raise direct if API. by @pseudotensor in #989
- Add prompt and test for https://huggingface.co/BAAI/AquilaChat2-34B-16K and related chat models by @pseudotensor in #986
- fix the problem with image pull secrets by @lweren in #992
- relax max_new_tokens to be per prompt by @pseudotensor in #998
- Avoid system OOM when too many pages for doctr by @pseudotensor in #999
- Implement HYDE by @pseudotensor in #1004
- Use migration-safe
/submit_nochat_api
for the ChatCompletion API by @this in #1010 - Add
client.list_models()
method by @this in #1012 - Update get_limited_prompt and use tokenizer from llama.cpp directly by @pseudotensor in #1015
- Typo by @MSZ-MGS in #1016
- init container image override by @lweren in #1019
- Fix source file link by @us8945 in #1024
- For codellama or other JSON friendly models, stack system prompt with instructions and give document chunks in json, and ask for json output. by @pseudotensor in #978
- Refactor models API in the Client by @this in #1026
- Rename
models
param tomodel
in the Client by @this in #1029 - Update client/README.md by @surenH2oai in #1031
- One click installer setup for MacOS by @Mathanraj-Sharma in #1033
- Add
client.server
API to the Client by @this in #1036 - [Client] Refactor classes in the completion APIs into a separate sub-module by @this in #1039
- Make Helm work with external and local LLM's by @EshamAaqib in #1034
- Add annotations to h2ogpt web svc by @ozahavi in #1044
- [Docs] Add downloading client from the GH release by @ChathurindaRanasinghe in #1042
- [Client] Add streaming support for the text completion API by @this in #1046
- Various summarization/extraction fixes + easier llama.cpp control + redesign of Models UI by @pseudotensor in #1045
- Allow multiple llama, but llama.cpp is not thread safe, so only allowed if doing inference server for all but one. by @pseudotensor in #1050
- Update links by @arnocandel in #1056
- Windows one-click Nov5 by @pseudotensor in #1055
- made changes singtel requested by @overaneout in #1038
- [DevOps] Cloud Image Fixes by @ChathurindaRanasinghe in #1057
- Add configs related to MPS for one click installer by @Mathanraj-Sharma in #1060
- Add Mac one click installer to README - NOV 08, 2023 by @Mathanraj-Sharma in #1064
- Fix prompting in langchain pandas csv agents, missing format_instructions and uses mrkl prompt even if make class on top, and no way to work around by @pseudotensor in #1058
- Youtube and local audio transcription by @pseudotensor in #1070
- Commands to give permissions for Mac one-click installers by @Mathanraj-Sharma in #1071
- Fix preload of ASR and allow embedding model to be on any GPU by @pseudotensor in #1074
- Parse files inside tar.gz by @Mathanraj-Sharma in #1073
- Reorganize UI a bit, and make it easier to upload url vs. text, autodetect by @pseudotensor in #1075
- Add deepseek coder prompt by @pseudotensor in #1083
- Update README_offline.md by @achraf-mer in #1096
- [DOCS] Improve readability of README_ui.md by @zainhaq-h2o in #1103
- Streaming Speech-to-Text (STT) and Streaming Text-to-Speech (TTS) with Voice Cloning and Hands-Free Chat by @pseudotensor in #1089
- Update README.md (cosmetics) by @MSZ-MGS in #1104
- Fix Typo in FAQ by @daanknoope in #1112
- [DOCS] Client APIs README typo fixes and readability edit by @zainhaq-h2o in #1117
- [HELM] Remove default values from overrideConfig by @EshamAaqib in #1120
- [DOCS] Improve GPU readme by @zainhaq-h2o in #1141
- Upgrade to gradio4 by @pseudotensor in #1110
- For Issue #1142, not a specific fix yet. Noticed documents that failed to parse were coming up as selectable documents. Fix that. by @pseudotensor in #1150
- More for Issue #1142 -- allow filter files and content by substrings and operations and/or by @pseudotensor in #1151
- Return prompt_raw so e.g. LLM and langchain prompting with docs can be seen by API by @pseudotensor in #1152
- Update gpt_langchain.py to support Youtube Shorts by @cherrerajobs in #1154
- web scrape by @pseudotensor in #1156
- Chunk streaming to help speed due to gradio UI slowness/bugs by @pseudotensor in #1162
- Use openai v1 for vllm by @pseudotensor in #1164
- [DOCS] Improve Linux readme by @zainhaq-h2o in #1149
- More streaming optimizations for good UX by @pseudotensor in #1171
- Gradio API call examples by @us8945 in #1174
- Make Claude and other non-sytem prompt models use chat history to mimic system prompt by @pseudotensor in #1177
- Minor doc improvements by @zainhaq-h2o in #1187
- Remove call to ngpus and openai/vllm client creation so faster when using by @pseudotensor in #1192
- Add video frame extraction, image chat, and image generation by @pseudotensor in #1181
- [HELM] Add option to run vLLM and h2oGPT on same pod by @EshamAaqib in #1194
- Gemini by @pseudotensor in #1208
- [DOCS] Minor doc fixes and improvements by @zainhaq-h2o in #1205
- Support docsgpt https://huggingface.co/Arc53/docsgpt-7b-mistral by @pseudotensor in #1215
- improve streaming and error logging by @pseudotensor in #1218
- docs: faq: document auth.json file format by @Blacksuan19 in #1206
- Use transformers version of attention sinks: https://github.com/huggingface/transformers/releases/tag/v4.36.0 by @pseudotensor in #1219
- Allow private model that fails to load to not revert tokenizer to None if passed tokenizer_base_model by @pseudotensor in #1223
- hide action selection if only one action is enabled by @Blacksuan19 in #1224
- add ability to set custom page title and favicon by @Blacksuan19 in #1225
- OpenAI Proxy Server redirects to Gradio Server by @pseudotensor in #1231
- Improve testing for OpenAI server and fix key issues with auth etc. by @pseudotensor in #1234
- Handle errors better for OpenAI client by @pseudotensor in #1235
- Reachout by @pseudotensor in #1236
- Allow langchain for eval and add test -- Fixes #1244 by @pseudotensor in #1246
- Allow persistence for GradioClient for Issue #1247 by @pseudotensor in #1249
- Fixes #1247 by @pseudotensor in #1251
- Go back to checking system hash since stored in docker image now, even if takes 0.2s, worth it. Could delay checks to every minute or something, but more risky. by @pseudotensor in #1253
- [HELM] Add vLLM check when running as stack by @EshamAaqib in #1255
- Control llava prompt by @pseudotensor in #1262
- Remove HYDE accordion outputs if present before giving history to LLM, and remove chat=True/False for prompt generation, hold-over and led to bugs in prompting for gradio->gradio by @pseudotensor in #1263
- Better exceptions docview by @pseudotensor in #1264
- Fixes to Helm Chart by @EshamAaqib in #1269
- use docker compose with a Dockerfile to force rebuild if new by @achraf-mer in #1273
- Windows update Jan 8, 2024 by @pseudotensor in #1272
- minor package upgrades by @pseudotensor in #1275
- MistralAI by @pseudotensor in #1290
- Enforce allow_upload_to_user_data and allow_upload_to_my_data -- Fixes #1296 by @pseudotensor in #1297
- Update README_MACOS.md by @antoninadert in #1292
- [DOCS] readme minor readability improvements by @zainhaq-h2o in #1299
- Ensure parameters for OpenAI->h2oGPT are transcribed. by @pseudotensor in #1301
- Rotate image before OCR/DocTR - WIP by @pseudotensor in #1239
- Allow API call for conversion of text to audio by @pseudotensor in #1310
- Update README_MACOS.md by @antoninadert in #1308
- More API protection by @pseudotensor in #1314
- [HELM] Fix
PodLabels
by @EshamAaqib in #1318 - Update README_MACOS.md by @antoninadert in #1325
- exposing imagePullSecret and tag in values.yaml by @robinliubin in #1328
- Update QR code. by @arnocandel in #1336
- h2ogpt support namespaceOverride by @robinliubin in #1337
- Update docker for better vllm support, go to higher cuda for cuda kernels to exist by @pseudotensor in #1339
- Some package updates by @pseudotensor in #1344
- Add verifier -- only via API for now by @pseudotensor in #1267
- fix-33075_adding_shared_memory by @robinliubin in #1352
- Cu121 by @pseudotensor in #1368
- Add vision models as llms by @pseudotensor in #1369
- Upgrade to gradio4 3rd attempt by @pseudotensor in #1380
- Increase timeout when have failure to make sure we know the reason. by @pseudotensor in #1384
- Faster for llava by @pseudotensor in #1392
- Fixes #1270 by @pseudotensor in #1396
- [DOCS] Minor doc improvements by @zainhaq-h2o in #1402
- Fixes #1324 -- clear memory when browser tab closes by @pseudotensor in #1407
- Fix TEI use of HuggingFaceHubEmbeddings by @pseudotensor in #1424
- Fix login if chatbot counts differ from in auth file by @pseudotensor in #1429
- Improve auth/login for OpenAI API and fix AWQ by @pseudotensor in #1434
- GPT's user review functionality added by @Darshan-Malaviya in #1436
- [HELM] Add option to disable anti affinity by @EshamAaqib in #1423
- Update MacOS doc with information related to BFloat16 error by @Mathanraj-Sharma in #1442
New Contributors
- @lo5 made their first contribution in #64
- @cpatrickalves made their first contribution in #63
- @jefffohl made their first contribution in #68
- @eltociear made their first contribution in #70
- @ChathurindaRanasinghe made their first contribution in #131
- @orellavie1212 made their first contribution in #132
- @hsm207 made their first contribution in #218
- @this made their first contribution in #242
- @fazpu made their first contribution in #256
- @3x0dv5 made their first contribution in #274
- @parkeraddison made their first contribution in #328
- @ernstvanderlinden made their first contribution in #377
- @0xraks made their first contribution in #375
- @cimadure made their first contribution in #387
- @wienke made their first contribution in #402
- @jllllll made their first contribution in #458
- @slycordinator made their first contribution in #487
- @jcatana made their first contribution in #547
- @mmalohlava made their first contribution in #543
- @zba made their first contribution in #659
- @anfrd made their first contribution in #663
- @ceriseghost made their first contribution in #698
- @ryanchesler made their first contribution in #752
- @patrickhwood made their first contribution in #756
- @muendelezaji made their first contribution in #778
- @iam4x made their first contribution in #780
- @Mins0o made their first contribution in #790
- @ffalkenberg made their first contribution in #819
- @tomkraljevic made their first contribution in #833
- @lamw made their first contribution in #892
- @lweren made their first contribution in #909
- @jamesbraza made their first contribution in #845
- @hemenkapadia made their first contribution in #941
- @squidwardthetentacles made their first contribution in #925
- @AniketP04 made their first contribution in #981
- @MSZ-MGS made their first contribution in #1016
- @us8945 made their first contribution in #1024
- @surenH2oai made their first contribution in #1031
- @ozahavi made their first contribution in #1044
- @overaneout made their first contribution in #1038
- @daanknoope made their first contribution in #1112
- @cherrerajobs made their first contribution in #1154
- @Blacksuan19 made their first contribution in #1206
- @antoninadert made their first contribution in #1292
- @Darshan-Malaviya made their first contribution in #1436
Full Changelog: https://github.com/h2oai/h2ogpt/commits/0.2.0