Ethical and Responsible Large Language Models: Challenges and Best Practices

Title: Ethical and Responsible Large Language Models: Challenges and Best Practices
Date: August 23, 2023
Duration: 1 HR

Miquel Noguer i Alonso
Founder, Artificial Intelligence Finance Institute

Nicole Konigstein
Head of AI and Quantitative Research, Quantmate

Registration Link

History of Language Models and AI (free video for ACM members)
Language Models (free video for ACM members)
Introduction to Transformers for NLP: With the Hugging Face Library and Models to Solve Problems (free book for ACM members)
Real-World Natural Language Processing (free book for ACM members)
Training a Language Model (free video for ACM members)

What are your thoughts on taking care with Large Language Models and their uses based on this TED talk?

To me, the use of ML and AI in military applications like those currently in use in Ukraine is a huge concern.

I just want to ask a basic question about AI. I understand that LLM’s and good engineering have recently allowed quite interesting advances in AI. But I have been around for a while, since I was a graduate student working for John McCarthy at MIT, a long time ago.
Is there any difference between a clever piece of programming and AI?
In the press today, everything is AI.
But Turing certainly didn’t think of it that way in his “Can Machines Think” paper.
And McCarthy, as late as early 2000’s still hesitated on saying machines can “think”.
So can any of the speakers today give me a way of deciding between a clever piece of programming on one hand, and a program that exhibits AI on the other hand?

Isn’t a development period of around 25 years from the 2000 DotCom phase to AI Automated development development (Singular critical point of control) a very very short time, compared to the most complex Intelligence we know - Nature, and it’s billions of years of taking - some say - the slowest possible approach?
When was the moment when we became smarter than what makes us?

Just a general questions from an interested non-expert: I know a few parents that cannot control the amount of Social Media usage consumption and it’s possible negative effects by/on their children - we are talking about Web 2.0 just taking/showing it’s effects in an area (child safety) that is vital to all of us and we show bad results - isn’t a very cautious approach necessary for all possibly far reaching new inventions/technology?

Very interesting and important approach to find global common safety needs and approaches - since many new technologies like AI or Quantum Computing/Processing/Transmitting etc. could be seen as possible WMDs. Maybe similar initiatives as with Nuclear Disarmament and Proliferation control are needed to provide a commom safety net? Maybe only the generation after our kids generation (Web.2.0) will have enough critical understanding of the dangers and benefits to take further steps?

I would like to hear your thoughts about recent practices involving AI Models training on AI generated content - A phenomenon seen in AI generated images, bringing their rapid development to stale. Would it be responsible to exclude AI content in training data for LLM, if we accept it as content when it is generated in the first place?

Also, if there are any insights from you regarding the ethical problem of using copyright protected art in generating images, which are then copyright-free, I’d be glad to hear them.

Thank you for your input.

The LLM knowledge modeling approaches currently being used are locked into a snapshot of the status quo and thus make it cheaper, faster, and easier, to amplify that recent state of knowledge.

LLMs can provide a broader swath of the status quo to bear on an inquiry than any given actor might already have, and that is valuable.

But. We know that the current state of knowledge is always seen in the historical rearview mirror as filled with ignorance.

The concern is that this amplification takes resources and attention away from improving the status quo.


Will the session recording be made available?

When we make automated models of human behavior it seems to me that “artificial” is the wrong word. “Intelligence modeling” seems closer to the realities, though it is not as catchy.

If we were to know what thinking is and use that as a comparison criterion, we would then be able to decide when something is thinking. Otherwise, all bets are off, no?

For the question about how to access the notebook for the plotting these viszualizations and for further explaining,** what these plots mean, they can refer to the repo from the talk: GitHub - Nicolepcx/ACM_TechTalk

For the questions about LIME. No, LIME is not just another black-box layer, it simplifies the specific point we are trying to look at. Since an explanation is a local linear approximation of the model’s behaviour. Further information how this works in detail can be found here: GitHub - marcotcr/lime: Lime: Explaining the predictions of any machine learning classifier

Yes, LIME is for text classification and not for generative cases usable. For generative cases one should use logs and tables and evaluate different sampling and search methods for generating text. Since this influences how “creative” the model will be with it’s response.

For the several questions about bias in the data and about RLHF and what values to choose etc. The general point I wanted to make here, was, that LLMs are pre-trained on data we are not fully know. And this may introduce some bias into our own applications. Hence, we first need to find this bias and then have ways of mitigating these inherint biases by using a diversified dataset for mitigating this bias. Which involves this:

• Sourcing data from a wide range of sources and domains
• Ensuring data collection includes underrepresented groups or minority perspectives
• Actively seeking and incorporating feedback from diverse users and stakeholders

And either fine-tuning the model with a new dataset, using prompt engineering for zero and few shot learning or using RLHF are ways to mitigate this.