HTB - Like-A-Glove Writeup by McShooty
Challenge Description
Words carry semantic information, and similar to how people can infer meaning based on a word's context, AI can derive representations for words based on their context too. However, the kinds of meaning that a model uses may not match our own. In this challenge, we've encountered a pair of AIs communicating in metaphors that are challenging to decode! The embedding model used is GloVe (Global Vectors for Word Representation), specifically the glove-twitter-25 variant.
Key Points: - The AI model used in this challenge is glove-twitter-25. To reverse engineer the flag, we will use the same model.
Analyzing the Provided File
Input File: chal.txt
The file contains several lines formatted as follows:
Like <word1> is to <word2>, <word3> is to?
Here are a few examples from the file:
Like non-mainstream is to efl, battery-powered is to?
Like sycophancy is to بالشهادة, cont is to?
Like беспощадно is to indépendance, rs is to?
Like ajaajjajaja is to hahahahahahahahaahah, 2 is to?
...
Like raving is to سگن, happy is to?
Considering that there are two AIs communicating with each other, we can infer that they are exchanging the flag in this peculiar manner.
To tackle this challenge, it helps to visualize language models as a matrix word-plane, even though language models are typically trained in a high-dimensional space:

Each of these points can be considered as a vector in space. For example, if "hackthebox" is represented as the point ((1.36, 2.48)), we want to find the vector corresponding to the word that replaces the question mark in the sentence:
Like non-mainstream is to efl, battery-powered is to?
Visualizing the Analogy
We can visualize the relationship between the words:

By calculating the vector similarity of the first two words, we can identify the word that has a similar relationship to the third word.
Mathematical Calculations
To express this mathematically, if we denote the word we are looking for as (\vec{x}) (representing word4), we can derive the relationship as follows:
$$ \vec{x} \approx \vec{word2} - \vec{word1} + \vec{word3} $$
This formula indicates that we find (\vec{x}) by applying the transformation from word1 to word2 onto word3.
Using the cosine similarity formula, we define it as:
$$ \text{cosine_similarity}(\vec{a}, \vec{b}) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}| |\vec{b}|} $$
where $\cdot$ denotes the dot product, and $|\vec{a}|$ and $|\vec{b}|$ are the magnitudes (or norms) of the vectors. By manipulating the vectors appropriately, we can uncover the hidden word.
Utilizing Existing Tools
Fortunately, modern technology has alleviated the need for manual calculations. A Python library is available that facilitates the calculation of word similarities using the GloVe-Twitter-25 model: Gensim.
The Script
Here's how the script is structured:
- Import the Model:
import gensim.downloader as api
- Load the Model:
def load_model(model_name='glove-twitter-25'):
model = api.load(model_name)
return model
- Retrieve the Word Vector:
def get_word_vector(model, word):
try:
vector = model[word]
return vector
except KeyError:
return None
- Extract Words Using Regular Expressions:
match = re.match(r"Like (.+?) is to (.+?), (.+?) is to\?", line.strip())
- Calculate the Analogy:
if match:
word1, word2, word3 = match.groups()
vector1 = get_word_vector(model, word1)
vector2 = get_word_vector(model, word2)
vec_target = get_word_vector(model, word3)
if vector1 is not None and vector2 is not None and vec_target is not None:
analogy_vector = vec_target + (vector2 - vector1)
result = model.similar_by_vector(analogy_vector, topn=1)
print(f"'{word1} is to {word2} as {word3} is to {result[0][0]}' with similarity {result[0][1]}")
Full Script Example
The complete script looks like this:
import gensim.downloader as api
import re
def load_model(model_name='glove-twitter-25'):
model = api.load(model_name)
return model
def get_word_vector(model, word):
try:
vector = model[word]
return vector
except KeyError:
return None
def process_line(line, model):
match = re.match(r"Like (.+?) is to (.+?), (.+?) is to\?", line.strip())
if match:
word1, word2, word3 = match.groups()
vector1 = get_word_vector(model, word1)
vector2 = get_word_vector(model, word2)
vec_target = get_word_vector(model, word3)
if vector1 is not None and vector2 is not None and vec_target is not None:
analogy_vector = vec_target + (vector2 - vector1)
result = model.similar_by_vector(analogy_vector, topn=1)
print(f"'{word1} is to {word2} as {word3} is to {result[0][0]}' with similarity {result[0][1]}")
else:
missing_words = [word for word, vec in zip([word1, word2, word3], [vector1, vector2, vec_target]) if vec is None]
print(f"The following words were not found in the model: {', '.join(missing_words)}")
else:
print(f"Line format is incorrect: {line.strip()}")
def process_file(filename, model):
with open(filename, 'r', encoding='utf-8') as file:
for line in file:
process_line(line, model)
def main():
model = load_model()
filename = 'chal.txt'
process_file(filename, model)
if __name__ == "__main__":
flag = ""
main()
print(flag)
Conclusion
By utilizing the GloVe embedding model and understanding vector arithmetic, we successfully decipher the hidden messages exchanged between the AIs. This challenge showcases the power of word embeddings in capturing semantic relationships and enables us to find the flag hidden within the metaphorical dialogue.
If you have any questions or need further clarification on specific parts, feel free to reach out!