This is part of the documentation for uwot.

This page is a gallery of images of the result of running the lvish function on some datasets. For all intents and purposes, it should perform very similarly to the LargeVis method (see also its github).

For details on the datasets, and to compare with the output of UMAP and t-SNE, see the UMAP examples gallery.

Gaussian Perplexity

By default, lvish uses a Gaussian similarity function to determine perplexities, just like t-SNE. These results are given below.

There are two images per dataset. The left-hand image uses a perplexity of 15, which is similar to the sort of settings UMAP uses. The right-hand image is for a perplexity of 50, which is the LargeVis default.

The only other non-default settings was to use pca = 100, which reduces the input dimensionality to 100.

iris_lv15 <- lvish(iris, pca = 100, perplexity = 15)
iris_lv50 <- lvish(iris, pca = 100, perplexity = 50)

Note that by default lvish uses a random initialization and a much larger number of epochs to match the LargeVis defaults. This makes the optimization take a lot longer than UMAP. LargeVis uses multiple threads during the optimization phase, but lvish does not, to ensure reproducibility of results with a fixed random seed. To get multi-threaded performance like LargeVis, add the option, n_sgd_threads = "auto", e.g.:

iris_lv15 <- lvish(iris, pca = 100, perplexity = 15, n_sgd_threads = "auto")

I would also suggest that you fix the number of epochs to a smaller value initially and see if that provides an adequate visualization.

iris_lv15 <- lvish(iris, pca = 100, perplexity = 15, n_sgd_threads = "auto", n_epochs = 500)

iris

iris p15 iris p50

s1k

s1k p15 s1k p50

oli

oli p15 oli p50

frey

frey p15 frey p50

coil20

coil20 p15 coil20 p50

coil100

coil100 p15 coil100 p50

mnist

mnist p15 mnist p50

fashion

fashion p15 fashion p50

kuzushiji

kuzushiji p15 kuzushiji p50

norb

norb p15 norb p50

tasic2018

tasic2018 p15 tasic2018 p50

macosko2015

macosko2015 p15 macosko2015 p50

Default initialization in lvish, as with LargeVis and t-SNE, is from a random distribution. As with t-SNE, you can see one issue with that is that sometimes clusters get split up by another cluster and are unable to re-merge. MNIST is the easiest example image to see this in.

In general, there’s not a huge difference in the effect of increasing perplexity, for larger datasets. For smaller datasets it’s apparent that the resulting clusters tend to be more spread out with larger perplexity values. The norb (small NORB) dataset shows an obvious difference, where the perplexity = 15 results are clearly too low, and break up the structures that are apparent at perplexity = 50. A similar effect is seen when using UMAP, so I don’t think this is due to the random initialization of lvish in this case. A contributing factor is likely to be that the initial PCA dimensionality reduction to 100 dimensions is too aggressive for NORB and reduces the nearest neighbor accuracy, which is recovered at higher perplexities (as this requires finding more near neighbors).

On the other hand, it’s hard to see what’s going on with the coil20 and especially the coil100 results. If you could see what was going on from the static images above, it would be apparent that, in contrast to the norb results, the perplexity = 50 results are too high here, and the loop structure of the clusters gets broken up.

The coil100 and coil20 results show an issue with using LargeVis (and UMAP) that isn’t normally a problem with t-SNE: the extra repulsion in their cost function can often spread the data quite far apart compared to the cluster sizes. t-SNE has the opposite problem of the clusters expanding into a large circular form which makes discerning clusters harder as the datasets get larger, but in a single static plot, I find the t-SNE results to be easier to examine. For UMAP and lvish, you may have to resort to more interactive means of examining the data, such as using the embed_plotly function in vizier.

An alternative for lvish is to modify the repulsion_strength parameter (referred to as gamma in LargeVis). The default value, 7 is taken from the LargeVis paper but seems to have been chosen empiricially. Here are results for coil20 and perplexity = 50 with the repulsion reduced to repulsion_strength = 0.7 in the left image, and repulsion_strength = 0.07 on the right:

coil20 p50 r0.7 coil20 p50 r0.07

This helps a bit, but there are limits: the blue cluster on the right remains an outlier, and reducing the repulsion_strength too far causes some of the loops to shrink, as can be seen with the outlying blue and black clusters for the right-hand plot.

KNN Perplexity

As an alternative to using the Gaussian perplexities, you could use the k-nearest neighbor graph directly, which involves setting the similarity of \(i\) with \(j\) to 1 if \(i\) is in the k-nearest neighbors of \(j\), and 0 otherwise. The usual t-SNE procedure of symmetrizing (but not the normalization step) is then carried out. There are some t-SNE implementations which use kNN-derived perplexities, e.g. the majorization-minimization approach of Yang and co-workers.

The advantage of using the kNN kernel is that you get a sparser set of edges, which due to lvish using the same formula for determining the number of iterations required, results in a shorter run time.

For the results below the kNN perplexities were used by setting kernel = "knn"

iris_lv15k <- lvish(iris, pca = 100, perplexity = 15, kernel = "knn")
iris_lv50k <- lvish(iris, pca = 100, perplexity = 50, kernel = "knn")

For comparison, the top row of images use the same settings as in the previous section with Gaussian perplexities, and the bottom row show the results for using the kNN kernel. The run time for each embedding is given in the image. I used version 0.1.3 of uwot from CRAN.

iris

iris p15g iris p50g
iris p15k iris p50k

s1k

s1k p15g s1k p50g
s1k p15k s1k p50k

oli

oli p15g oli p50g
oli p15k oli p50k

frey

frey p15g frey p50g
frey p15k frey p50k

coil20

coil20 p15g coil20 p50g
coil20 p15k coil20 p50k

coil100

coil100 p15g coil100 p50g
coil100 p15k coil100 p50k

mnist

mnist p15g mnist p50g
mnist p15k mnist p50k

fashion

fashion p15g fashion p50g
fashion p15k fashion p50k

kuzushiji

kuzushiji p15g kuzushiji p50g
kuzushiji p15k kuzushiji p50k

norb

norb p15g norb p50g
norb p15k norb p50k

tasic2018

tasic2018 p15g tasic2018 p50g
tasic2018 p15k tasic2018 p50k

macosko2015

macosko2015 p15g macosko2015 p50g
macosko2015 p15k macosko2015 p50k

For smaller datasets, the kNN kernel gives noticeably different results to the Gaussian perplexity, particularly for perplexity = 50. For iris, s1k and oli, the trend seems to be that the results are more expanded with the kNN kernel. For frey, coil20 and coil100, the clusters are more separated.

For the larger datasets, the difference in behavior is less pronounced, although the macosko2015 results show a larger cluster separation as well.

The good new is that for all cases, the run times are noticeably reduced.

So for larger datasets, using kernel = "knn" seems to be an ok choice for reducing the runtime of lvish. It also seems that you may want to use a smaller value of the perplexity than you would with a Gaussian perplexity, which further reduces the runtime. For smaller datasets, results are more mixed. It seems that a smaller perplexity in this case is definitely to be preferred.