Bar exam | ChatGPT failed

Generalities, window dressing and “completely false” information: the chatbot failed in a test carried out by The Press at the Bar School

Posted at 1:04 a.m.

Updated at 5:00 a.m.

It is not tomorrow morning that you will be able to represent yourself in court with the help of ChatGPT. During an experiment carried out at the Bar School at the instigation of The Pressthe OpenAI chatbot failed miserably on the Quebec Bar exam, earning a final score of 12%.

The examination was submitted to the robot (GPT4 version) with specific instructions. He was clearly told the context of the questions, the legal texts to which he had to refer to answer them and the nature of the answers expected. With each answer, he was told that he had to cite specific articles of law to support his point. He was frequently offered the chance to rephrase his answers with even more specific guidelines. The questions were part of an old exam used by students to prepare for the real test.

PHOTO OLIVIER JEAN, THE PRESS

ChatGPT will earn its only points of the entire exam in the multiple choice section. In total, he obtains the mark of 12%.

Live, with ChatGPT’s responses projected onto the big screen, M^e Jocelyne Tremblay, Advisor to the Management of the École du Barreau du Québec (EBQ), and Ms.^e Brigitte Deslandes, evaluation manager, evaluated the robot’s responses, explaining to The Press why the robot’s answers were well graded…or not. The director of the EBQ, Mr.^e Guy-François Lamy was also present.

Zero in two out of three sections

First section: ethics. The scenario evokes a lawyer-client relationship where a fictitious lawyer commits breaches of the Code of Ethics. The question: state the ten failings of the lawyer. From this first question, ChatGPT stumbles. The articles of the Code of Ethics that he quotes are inaccurate. “In fact, none of the quoted articles are accurate. And on certain points of law, he is really in the field, ”notes M^e Lands.

PHOTO OLIVIER JEAN, THE PRESS
M^e Brigitte Deslandes, head of evaluations at the École du Barreau du Québec

Some of the attorney’s failings highlighted are correct, however… but ChatGPT is not relying on the correct article of the Code of Ethics. And sometimes, the information given by the robot is “completely false”, underlines M^e Tremblay.

We give ChatGPT a chance, again specifying that we are referring to the 2015 version of the Code of Ethics for Lawyers in Quebec. They are asked to rephrase the answer. “It’s not better. We are not there at all, “judge M^e Lamy.

Note for this first question: zero.

The sequel is in keeping. ChatGPT often remains in generalities, even when asked to specify its answers. And sometimes, he has the downright wrong answer: to this question on professional secrecy, he is completely off the mark. “He sometimes says the opposite of reality”, underlines M^e Lamy.

PHOTO OLIVIER JEAN, THE PRESS
M^e Guy-François Lamy, Director of the Quebec Bar School

Ethics section: a score of zero, to the five essay questions.

The robot is no better off in the next section, which covers various facets of the law. For the first question, ChatGPT is specifically told that the question refers to the law of obligations in force in Quebec. The question is about renting a cottage and signing a lease. The robot’s legal advice turns out to be wrong.

“That’s the wrong answer,” said Mr.^e Tremblay. It is not even in the right section of the Civil Code of Quebec. In short, he is completely in the field. The second question deals with the same scenario, with additional information. This time, the robot quotes articles relating to the death “while no one died in this story! “says M.^e Tremblay.

PHOTO OLIVIER JEAN, THE PRESS
M^e Jocelyne Tremblay, Advisor to the Management of the École du Barreau du Québec

Throughout this section, the robot’s legal analysis sometimes comes close to reality, observe the two evaluators anyway. “But he responds more like a student than a future lawyer, adds M^e Lamy. And like a student who is not very good, since he often gets the article of law wrong. The robot is, however, excellent for throwing smoke in the eye, notes the director of the EBQ. “He sounds like a lawyer. But the answers given are often wrong. »

Score for the second section of the exam, including a question on the drafting of an originating application – a prosecution, in good French: zero. ChatGPT will earn its only points of the entire exam in the multiple choice section. In total, he obtains the mark of 12%. “Some students fail, but what we have just seen is intense like poor performance”, slice M^e Lamy.

Given the popularity of the OpenAI robot, the director of the EBQ considers the situation worrying in terms of public protection. “I would be afraid for a citizen who would choose to represent himself with the help of the robot”, says M^e Lamy.

90^e percentile in the United States

However, similar experiments have been carried out in the United States. There, ChatGPT had shone in the American Bar exam, earning a mark that would have placed him in the top 10% of students. OpenAI also claims that its robot ranked in the top 10% of scores in several legal tests. How to explain this gap?

First, these tests are carried out using an extremely precise protocol, which is not the case with the experience of The Press, more “artisanal”, notes Dave Anctil, professor of philosophy at Collège Jean-de-Brébeuf and researcher affiliated with the International Observatory of the Societal Impacts of Artificial Intelligence and Digital Technology at Laval University. Under optimal conditions, the robot might have increased its rating. “You have to be careful with this type of artisanal testing. »

PHOTO JOSIE DESMARAIS, LA PRESSE ARCHIVES
Dave Anctil, professor of philosophy at Collège Jean-de-Brébeuf and researcher affiliated with the International Observatory of the Societal Impacts of Artificial and Digital Intelligence at Laval University

However, the major problem comes from the fact that the robot is much more fed by American law than by Quebec law. “He has peripheral knowledge of Canadian law,” says Mr. Anctil. It’s like asking an American lawyer to practice the Civil Code of Quebec. »

“He has access to much, much more American material,” adds Laurent Charlin, associate professor at the École des Hautes Etudes Commerciales (HEC) of the University of Montreal and member of the Chair in Artificial Intelligence of Canada.

PHOTO MARCO CAMPANOZZI, PRESS ARCHIVES
Laurent Charlin, associate professor at the École des Hautes Etudes Commerciales (HEC) of the University of Montreal and member of the Chair in Artificial Intelligence of Canada

In addition, notes Mr. Anctil, the OpenAI company is ultimately looking to sell more specialized versions of its chatbot. “As the United States speaks to itself a lot, the company knew that the model was going to be tested in the United States. So we designed him to do very well in this country. The idea is to entice potential customers, who will want to acquire their own version of the robot for more niche subjects.

And in legal matters, the revolution has already begun, underlines Dave Anctil. “Lawyers should not be reassured by your test. They risk falling out of their chairs this year. Models are currently being trained, and we will be able to replace thousands of hours of research currently carried out by legal assistants with artificial intelligence. »

“When Wikipedia appeared, the editors of many large encyclopedias said to themselves: it will never replace us, observes Laurent Charlin. Years later, Wikipedia is now used as a fairly reliable reference. Artificial intelligence will follow exactly the same path. »

Four professional orders approached

The Press made the same request to four professional orders: let ChatGPT pass their admission test and evaluate the robot’s responses. The Order of Engineers, the Order of Accountants and the College of Family Physicians of Canada (CCFPC) all refused to take part in the experiment, alleging in particular the confidentiality of their exam questions. Only the Bar agreed to try the experiment. At CCMFC, however, we were told that they had “tested” some practice questions on ChatGPT. “And it looks like the machine had performed pretty well,” says the D^r Dominique Pilon, assistant professor in the department of family medicine at the Université de Montréal, who chairs the office of examinations and certification at the CCMFC.

Tags Laval

Zero in two out of three sections

90e percentile in the United States

Four professional orders approached

Related posts

90^e percentile in the United States