Instruction finetuning

Instruction finetuning - RQ

2024. 10. 5. 20:41ㆍAI/LLM

전체 슬라이드 주소: https://docs.google.com/presentation/d/e/2PACX-1vR8vJ3j-TkJhFX9HFFNguz5IfB0HYtUJ5sAsxHD7Wn4N4fQhHU1ThG0rO36KZjcy-uKjvyI6pQZ57aN/pub?start=false&loop=false&delayms=3000

Research Question: What are some ways of collecting instruction tuning training data and what are their advantages and their drawbacks?

해당 주 paper 발표자가 나였어서, model generate instruction set에 대한 이후의 논문을 몇개 들고 와서 슬라이드를 만들었다.

발표는 예비군 때문에 다른 조원분이 해주심...

2024 JMLR에 발표된 Scaling Instruction-Finetuned Language Models는 인용수가 벌써 2000이 넘어가는 유명한 논문인데, 두 가지 신기했던 점이 있다.

첫 번째는, 내가 보기에는 그냥 scale 키우고 큰 academic insight가 없어 보이는데, 어떻게 인용수가 2000을 넘었는지. 이 때 요즘 LLM 논문들에 대한 약간의 회의가 들었다. 지난 학기 Diffusion 관련된 개별연구를 했을 때는 복잡한 수학 때문에 골치를 썩었지만, 수학적으로 신박한 해결책을 들고와서 실증적으로도 잘 되는 것을 보고 재밌었던 기억이 있다. 하지만 LLM 분야에 들어와서는 분명 핫한 분야고 인용수도 엄청나게 다들 높은 논문들인데 대부분은 그냥 "이렇게 했더니 잘됨. 이유는 모름?"인 논문들이 많아서, 논문을 보고 공부를 해도 나만의 깊은 Insight가 쌓이지 않는 기분이 든다.
- LLM의 내부 작동 원리에 대한 논문들을 모아놓은 Post - https://www.facebook.com/callee2006/posts/pfbid0WJRHQCsmBAq5ZGPbFEHLGW6QbKdaaftXSBX1neTEuy2Gvbxr62jX1kHFWAFeXEVUl
두 번째는 Jeff Dean이 저자로 끼어 있다는 점. Jeff Dean이 누군지는 다음을 참고: https://namu.wiki/w/%EC%A0%9C%ED%94%84%20%EB%94%98

제프 딘

미국 의 프로그래머이자 컴퓨터 과학자. 1999년, 구글에 입사한 이래로 수 많은 인재들이 모이는 구글에서도 손꼽히

namu.wiki

First of all, let me discuss about the instruction datasets generated by language models.

As we saw in Tuesday’s presentation, language model itself can generate instructions for its own training and it’s performance was much better than vanilla model.

This method is cost effective, because human annotators are not required and language models that are trained with machine generated instructions achieved similar level with models that trained with human generated instructions.

However, the quality of generated instruction is dependent on language model’s performance itself and can intensify model’s own weaknesses and biases.

There are more papers regarding model generated instructions.

The first one is ‘Instruction tuning with GPT-4’. In this paper, instructions made by language models are graded by a separate language model which can diversify model’s own risk.

And they let language model generate 5 candidates and pick the instruction that most highly scored.

The second one is about various methods that can increase the performance of instruction tuning.

They increased the number of tasks to 1800s, scaled the model size, and changed the formats of instruction to use chain of thought method.

The last one is about giving noises to generated instruction. Authors thought that giving the noise to generated instructions can improve the model’s performance on unseen tasks.

Null is to give no instructions to a model, Opposite is to tell the model to answer in an opposite manner. Rand Trunc is the method that randomly truncates the words in instruction.

Trunc-shuf randomly truncates and shuffles the words in instruction. Rand words is changing every words to random words.

Authors reported the ‘Opposite’ method showed the highest performance improvement.

They suggest that by adding the ‘Opposite’ term, model can generate more varying output. By this, a language model can be trained in a more broad range of labels and its performance can increase.

저작자표시 (새창열림)

'AI > LLM' 카테고리의 다른 글

Reasoning and Planning - Paper 발표(Let’s Verify Step by Step) (0)	2024.10.05
Instruction Finetuning(SELF-INSTRUCT)- paper 발표 (1)	2024.10.05
Decoder model vs Encoder-Decoder model RQ 발표 (0)	2024.10.05
Relative Positional encoding RQ 발표 (0)	2024.10.05

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

규동이의 여행일기

규동이의 여행일기

태그

최근글

댓글

공지사항

아카이브

'AI > LLM' 카테고리의 다른 글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역