1 Introduction
Highdimensional data visualization is a very important problem for human to sense the data. Currently, the state of art methods are tSNE (
tSNE, BHSNE) and UMAP (UMAP), which has similar principle for the nonlinear low dimension reduction. They use neighborhood probability distribution to connect the highdimensional data points to lowdimensional map points, which try to make the local relative neighborhood relation unchanged but ignoring the change in the macro structure of the data. However, this may make the low dimension map points representing the highdimensional structure unfaithfully. In the lowdimensional neighborhood keeping and patching process, tSNE sometimes will make the neighborhood relations in the highdimensional structure break in the the lowdimensional space. We add a macro loss term on the loss of tSNE to make it keep the relative kmeans centroids structure in the low and high dimensional space, which basically keep the macro structure unchanged in the low dimensional space.
2 Methods
We now begin to derive the loss function of the global tdistributed stochastic neighborhood embedding (GTSNE). Suppose that there are
points in the highdimensional space, , where . We want to get their lowdimensional embedding map points, , where , where .2.1 The Loss of tSNE
Recall that the loss function of tSNE is given by
(1) 
where
is the probability of high dimensional point
connecting to , and is the probability of low dimensional point connecting to . The probability and characterize the neighborhood relation of and . The close two points will have a higher probability than those two farseparated points. The key is that we need to seek the probability distributions in both the highdimensional space and low dimensional space, such that they can match each other, i.e. when we will have the best layouts in the low dimensional space.tSNE use the Gaussian probability to model the neighborhood relations in the high dimensional space, i.e.
(2) 
.
(3) 
where the was chosen such that it satisfies the perplexity equation
(4) 
. Intuitively, this means that the probability can effectively distinguish Perplexity neighbors of .
To solve the crowding problem, i.e. when make that the distance relation keeping in the low dimensional space, it will make that the mediately separate points in the high dimensional space clustering together in the low dimension space, tSNE uses the heavy tail tDistribution to model the low dimensional neighborhood relations.
(5) 
for , and .
In the above formulation, tSNE only captures the local neighborhood relations in the low dimension embeddings. In our numerical experiments, we find the tSNE map points can not faithfully represent the highdimensional data points. There exist two problems. The first one is that tSNE can not fully preserve the local neighborhood relation. This occurs when two neighbor points were separated by a line in the map points in the low dimension layout, tSNE will separate the two points on the two sides of the line and push them far away from the line. Note that the problem is due to that the tSNE loss is nonconvex, which is hard to optimize. Once a line lies in the middle of the near points, it is hard to push the line far away from the two points. The second problem is that tSNE can not preserve the macro structure of data, e.g, it will project a three dimensional sphere into 2D space but do not own a circle boundary. To overcome the above two problems, we propose the following GTSNE loss, which will consider both the local neighborhood structure, and also the macro structure of the data points.
2.2 Global tDistributed Stochastic Neighbor Embedding
To characterize the global structure of the high dimensional points, we use the kmeans clustering centroids in the high dimension space, their neighborhood relations and the data point with the centroids relations to represent the macro structures. To do this, we first run PCA on to get the PCA embedding , where . Then we run kmeans clustering algorithm on to get the kmeans centroids , where . The kmeans centroids capture the global structure of the data points. For each point , we calculate the probability that point belong to the cluster by the tDistributed distribution denoted by ,
(6) 
Note that we use the scaling factor on the distance , since this will use to represent the data point belong to the its cluster centroids in the low dimensional space.
To transfer the global structure information in the the low dimension map points, we use the tdistributed distribution to characterize these centroids relations,
(7) 
To characterize the low dimensional macro structure, we define the low dimension centroids by with the formula,
(8) 
And define the corresponding lowdimensional tDistributed macro neighborhood relations by
(9) 
for and .
Now we can get the GTSNE loss function,
(10) 
where are weight parameters of the loss. The loss was composed of three parts. The first part is the tSNE loss which penalizes the mismatch between and , such that will maintain the local neighborhood relation. The second part is the macro loss , which try to make the low dimensional centroids relations match the high dimensional centroids relations. The third part is the kmeans loss , which try to make that the map points satisfies the centroids belong relations .
After some mathematical calculation, we get the gradient of loss ,
(11) 
We use the gradient descent method to optimize the loss function. The adaptive learning rate scheme described by Jacobs Jacobs1988 was used, which gradually increases the learning rate in the direction in which the gradient is stable.
Now we give the GTSNE algorithm (1) to guide the details of imagination.
Implementation details. We use the quadratic tree (BHSNE) to compute the tSNE gradient part approximately.
3 Experiments
To compare performance of GTSNE , we compare it with PCA, tSNE and UMAP algorithms, on both the simulation data and real data.
The parameters are set to the default value for each algorithm.

PCA from sklearn.decomposition.PCA.

GTSNE: , , , .

tSNE from sklearn.manifold.TSNE: .

UMAP from umap.UMAP: "n_neighbors"= 30, "min_dist"= 0.3.
3.1 Simulation Data
We first run the algorithm on the simulated data to verify the effectiveness of GTSNE. The simulated data are three continuous lines in the high dimension data, which was generated by

Choose three start points of data points , . . where
is the zeros vector with length
and is the ones vector with length . 
Generate the data points where from the three starting points and moving along the velocity one by one. i.e.
We take and to generated the data. After running the dimension reduction methods, we get the results showed in Fig 1. The figure shows that tSNE break the lines while GTSNE and UMAP do keep the lines continuity, which shows the effectiveness of the GTSNE.
Why tSNE break the continuous line? From the figure, we see that two horizontal neighbor points are separated by the vertical line. After tSNE run into this state, the gradient of tSNE at one of the neighbor points was driven by two forces. The attractive force comes from their neighbor points, which will make this point close to them. The another repulse force comes the points on the the middle lines, which will make that the point far from them. When the two forces balanced with each other, i.e. canceled to zero. The point do not move when the algorithms running. Thus tSNE will jump into the local minimum of loss, and can not jump out from it by the gradient descent. When in the loss of GTSNE, the macro loss part will strength the attractive force of two neighbor points, since if the do not close to each other, the centroid probability will do not match to there high dimensional parts, so that it will make the continuity of the lines.
But in the figure, we also see that GTSNE twists the lines in the lowdimension map. This need to be improved, which is left to the future work.
3.2 Real Data
3.2.1 Five toy datasets
Now we run the algorithm on five famous toy datasets. Their information are summarized in Table 1
Dataset  Dimension  Description 

Blobs  Isotropic Gaussian blobs  
Iris  The iris dataset  
Wine  The wine dataset  
Swiss Roll  Swiss roll dataset  
Sphere  The sphere dataset 
After running the algorithms, we get the results showed in Fig 2.
From the results we see that GTSNE worked well on these datasets. On the Swiss Roll dataset, GTSNE preserves the continuous circle structure while tSNE and UMAP only get the half circle. On the Sphere dataset, GTSNE preserves the sphere shape which are better than the results of tSNE and UMAP.
3.2.2 MNIST dataset
The MNIST database of handwritten digits
, has a training set of 60,000 examples, and a test set of 10,000 examples. Each example is an image of pixels. The whole dataset contains examples.From the result, we see that GTSNE do a comparative representation with tSNE and UMAP.
3.2.3 Pancreas dataset
We now run the algorithms on the Pancreas dataset, this dataset was used in dynamicalRNAvelocity. It is a single cell RNAseq dataset. After selecting the velocity genes, we get the final dataset which has the dimension , i.e. cells and selected velocity genes. After running the algorithms, we get the results showed in Fig 4.
From the results, we see that GTSNE works similarly with tSNE and UMAP. And GTSNE generates a continuous map which is similar to UMAP, while the result of tSNE has some breaks in the continuous structure.
4 Discussion
GTSNE use the kmeans method on the PCA embedding of the high dimensional data to grasp the macro structure, and try to preserve the relative relations of centroids by probability in the low dimensional space. But it has some limitations.
The first problem of GTSNE is that is run slowly in the large dataset, for the MNIST dataset (), it takes about one and half hour. It need to make more efficient implementation.
The second essential question is that how to define the macro structure? In this paper, we use the kmeans centroids to characterize the macro structure, it is just an initial try. Can we find more reasonable solutions? The answer need to find by the reader.
5 Conclusion
In this paper, we proposed a visualization method — GTSNE, which is a modified version of tSNE. It include the macro structure in the loss function to make that the low dimensional map preserve the macro structure. We hope that this method will help to the data visualization which need to preserve the macro structures.
Acknowledgements
Thank to my family ( especially for my mother, Qixia Chen and father, Wenjiang Shi ) for they provides me a suitable environment for this work. Thank to all the teachers who taught me to guide me to the road of truth.
Appendix A. Code availability
GTSNE are available as python package on https://github.com/songtingstone/gtsne. Scripts to reproduce results of the primary analyses will be made available on https://github.com/songtingstone/gtsne2021. The code is learned and adapted the C implementation https://github.com/danielfrg/tsne of BHSNE (BHSNE), special thanks to Laurens van der Maaten and Daniel Rodriguez.
Appendix B. Derivation of the GTSNE gradient.
The derivation of GTSNE gradient is similar to the derivation of tSNE gradient. We now give the details of the derivation. The loss function of GTSNE consists of three parts,
so does its gradient,
(12) 
The first part is
(13) 
(14) 
(15) 
The second part is
(16) 
(17) 
(18) 
The third part is
(19) 
(20) 
The derivation is finished.
Comments
There are no comments yet.