May 15, 2024

Self-Consistency

Enhancing Model Performance

When it comes to optimizing AI models, prompt engineering plays a crucial role in improving their performance. By carefully crafting prompts, you can guide the model's responses and enhance its overall accuracy and reliability. One effective technique in prompt engineering is self-consistency.

Importance of Self-Consistency

Self-consistency is an advanced technique in prompt engineering that has shown promising results in improving model performance. Proposed by Wang et al. in 2022, this technique focuses on enhancing CoT (Chain of Thought) prompting for tasks involving arithmetic and commonsense reasoning.

Let's try a zero-shot arithmetic prompt without advanced prompting:
Prompt:

When I was 6 my sister was half my age.
Now I’m 70. How old is my sister?

Output: 

35

The answer is false. Now, let's use a prompt involving self-consistency from Table 17 in the study. The idea behind self-consistency is to sample multiple diverse reasoning paths instead of repeating using similar examples like in Chain-Of-Thought.

Prompt: 

Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.

Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.

Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total, they still have 74 - 35 = 39 chocolates. The answer is 39.

Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.

Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.

Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.

Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.

Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent $15. She has $8 left.

Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
A:

Output 1:

When I was 6 my sister was half my age, so she was 3. 
Now I am 70, so she is 70 - 3 = 67. 
The answer is 67.

Output 2:

When the narrator was 6, 
his sister was half his age, which is 3. 
Now that the narrator is 70, 
his sister would be 70 - 3 = 67 years old. 

The answer is 67.

Output 3:

When I was 6 my sister was half my age, 
so she was 3.
Now I am 70, so she is 70/2 = 35. 

The answer is 35.

You can already see that the correct answer is more frequent and so we can choose it.

Notes

Wang and colleagues present a more intricate approach to choosing the ultimate solution, involving the probabilities generated by LLM for each thought process. However, weighted outputs seem to typically yield similar or worse results. An attempt was made to also use the same self-consistency prompt on multiple models, and gather a majority answer from all of the outputs (an Ensemble-method). This, however, didn't achieve any improved accuracy.

Self Consistency vs Chain of Thought

In a study, self-consistency has demonstrated significant improvements over using Chain of Thought prompting alone. One of the key benefits of self-consistency is unlocked by the use of majority voting to determine the final answer. By taking the majority result among the multiple prompt iterations, you can reduce the impact of outliers or incorrect responses. This helps to improve the overall accuracy and reliability of the model's predictions.

Chain of Thought vs Self-Consistency

Ensemble method includes sampling multiple answers from different models, but we won't cover it as it's outperformed by Self-Consistency.

By incorporating Self-Consistency into your prompt engineering process, you can optimize the performance of AI models in tasks involving arithmetic and commonsense reasoning. This technique allows you to leverage diverse reasoning paths and select the most consistent answers, leading to more accurate and reliable outcomes.|


Test out the prompts with AI/ML AI Playground.

We're excited to see what amazing projects you will bring to life. Happy prompting!

Get API Key