A-Z.fi

Last modified: 2024-12-11

HTB - Lost In Hyperspace Writeup by McShooty

Challenge description:

"A cube is the shadow of a tesseract casted on 3 dimensions. I wonder what other secrets may the shadows hold."

Included File(s):

graph LR A[Lost In Hyperspace.zip] -- unzipped --> B[Tokenembeddings.npz] B -- unzipped --> C[Tokens.npy] B --> D[embeddings.npy]

Let's start by examining the Tokens.npy

> unzip Tokenembeddings.npz
> cd Tokenembeddings
> cat tokens.npy
TLETFL1EYWTV3{B834_DNL#IC-HAE4C5LWIO4{M!_ ... 6RBSHU%

hmm no help there yet but let's note that the string that is given to us is 110 characters (this will be important in a bit)

Now let's examine embeddings.npy. We can't just "cat" this file because we just get gibberish so let's use numpy itself to see what this file contains

import numpy as np

data = np.load("embeddings.npy")
np.save('data.npy', data)
print(data)
[[-0.38804208 -0.82960969  0.28498215 ...  0.84272571  0.18395522
   1.38538893]
 [ 0.23968969  0.49330195  0.29123344 ... -0.17795812 -0.34906333
  -0.36859668]
 ...
 [ 0.50472927 -0.88382723  0.79664604 ...  0.7187068  -0.18154163
   1.05677205]
 [ 0.05885257 -0.16288486 -0.00894871 ... -0.13074514 -0.11173366
   0.41113626]]

Process finished with exit code 0

This file seems to be considerably larger than the first one but luckily it is immediately apparent that this file is a matrix of m rows and n columns and both m and n are larger than 3 so we cannot visualize this matrix in a conventional manner.

MATLAB

Let's transfer these files to MATLAB so we can manipulate them easier.

Formatting to MATLAB files

from scipy.io import savemat
import numpy as np
import glob
import os
npzFiles = glob.glob("*.npz")
for f in npzFiles:
    fm = os.path.splitext(f)[0]+'.mat'
  d = np.load(f)
    savemat(fm, d)
    print('generated ', fm, 'from', f)

We should now have a new file; tokens_embeddings.mat that we can add to MATLAB.

MATLAB quickly shows us that both files have the same amount of rows (110) but the embeddings matrix has 512 columns while the tokens matrix only has one column. Since the tokens matrix is only made of characters we can now assume that we have to assign every character in tokens to the corresponding row in embeddings but we cannot visualize that until we have reduced the matrix to 3 columns (3 Dimensions).

Let's use a technique called Principal Component Analysis (PCA)

PCA

I would be lying if I said that I know exactly how this method is mathematically applied but the core idea is that we take a look at each vector and where it varies most one step at a time until we have something we can work with.

for example

3-Dimensional Data         2-Dimensional Data
       •                       •
     •   •        PCA        •   •
   • •   ••      --->       ••• •••
    ••••••                  ••••••
     •   •                    •  •

The data mostly keeps its information and the relationship between datapoints but we reduce the dimensions. We lose some data but that's life in 3D.

Thanking the mathematic gods for MATLAB so we do not have to do this by hand.

In MATLAB:
    >>  [ceoff,score,~]  = pca(embeddings);
    >>  embeddings_3d = score(:, 1:3);
    >> scatter3(embeddings_3d(:,1),embeddings_3d(:,2),embeddings_3d(:,3))

we now have the embeddings visualized in 3D space at first it doesn't look like much...

but check this Sh*t out

Now it's just a matter of mapping every character in tokens to the corresponding point in our S-P-I-R-A-L

And best of all we can do that using a for loop

>> for i = 1:length(tokens)
text(embeddings_3d(i,1),embeddings_3d(i,2),embeddings_3d(i,3),(string(tokens(i,1))))
end

And we get the flag; final Flag

FunkyHotspot HTB