Hugging Face Releases Free ChatGPT Clone: ​​HuggingChat

Machine Learning


Hugging Face, a machine learning community and AI tools platform, announced the release of HuggingChat. HuggingChat is an open source ChatGPT clone that anyone can use or download for themselves.

hugging face

Hugging Face is a company and an AI community. Provides access to free and open source tools for developing machine learning and AI apps.

One of Hugging Face’s recently completed projects is a large language model with 176 billion parameters called Bloom, available to anyone who agrees to follow a responsible AI license.

Access open source models in different categories, including multimodal, vision, audio, natural language processing, and reinforcement learning.

Hugging Face also hosts open source datasets and libraries and serves as a way for teams to collaborate, including repositories similar to GitHub.

Many of our services are available at Free, Pro, and Enterprise levels.

hug chat

The HuggingChat ChatGPT clone is based on the Open Assistant Conversational AI Model.

Open Assistant itself is a project of the non-commercial Large-scale Artificial Intelligence Open Network (LAION).

LAION is a global non-profit organization dedicated to providing access to cutting-edge technologies as open source.

Those people write:

“Our Belief
We believe that machine learning research and its applications can have a huge positive impact on our world and therefore need to be democratized.

our main goal
Release of open datasets, code, and machine learning models.

I want to teach the fundamentals of large-scale ML research and data management.

By making models, datasets, and code reusable without having to constantly train from scratch, we hope to promote efficient use of energy and computing resources to meet the challenges of climate change. “

The GitHub page for the Open Assistant chat model states:

“Open Assistant is a project to make great chat-based large-scale language models accessible to everyone.

By doing this, we believe we can revolutionize language innovation.

We hope that Open Assistant will help improve the world by improving the language itself, much like Stable Diffusion has helped the world create art and images in new ways. “

Hug chat training dataset

HuggingChat was trained on the very new OpenAssistant Conversations Dataset (OASST1) with data collected until April 12th, 2023.

The dataset research paper is dated April 2023 (OpenAssistant Conversations – Democratizing Coordination for Large Language Models – PDF).

This model uses the same training methodology created by OpenAI called Reinforcement Learning from Human Feedback (RLHF).

RLHF is a technique for creating high-quality human-annotated and quality-rated question and answer datasets that can be used to train an AI to follow instructions.

With this release, we have achieved our goal of making the RLHF method available to anyone who wants to train AI.

The research paper states:

“To democratize research on large-scale coordination, we are releasing OpenAssistant Conversations, a human-generated and human-annotated assistant-style conversation corpus containing 66,497 conversations in 35 different languages. distributed in a tree and annotated with 461,292 quality ratings.”

The dataset is the result of a global crowdsourcing of over 13,000 volunteers.

Crowdsourcing has been a great way to generate multilingual training data that contributes to high-quality datasets.

However, according to the researchers, the crowdsourcing approach also introduced limitations in dataset quality in the form of cultural and subjective biases of the individuals who created and evaluated the training data.

They also warned that more engaged participants tended to contribute more, resulting in an uneven distribution of their values ​​and biases.

The researchers conclude that the dataset may not represent the diversity of perspectives of all contributors.

For example, they sent a survey to their Discord channel (English only) asking open source contributors questions related to demographics (rather than ethnicity).

Language bias aside, the survey found that of the 226 respondents, 201 were identified as male, 10 were female, 5 were identified as non-binary/other, and 10 declined to answer. It became clear.

Despite this, we do not guarantee 100% that the dataset is free of harmful content, but we stand by it because it was created following strict quality guidelines.

The researcher wrote:

“To ensure the quality of our datasets, we have established strict contributor guidelines that all users must follow.

These guidelines are designed to prevent harmful content from being added to datasets and to encourage contributors to produce quality responses. “

Hug chat available

HuggingChat is now open to users. No registration is required to create a login account.

Don’t expect ChatGPT level output. Service is not up to that level yet. The app’s page lists it as version 0.0, which gives you an idea of ​​how mature it is at this point.

Nonetheless, this is an impressive achievement for the open source community, a first step, and free of charge.

Visit the HuggingChat webpage here.

HuggingChat webpage and user interface





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *