30 Hilarious Cosine similarity Puns

A question about word2vec, cosine similarity and conditional probability

I am a bit confused about the different between the meaning of conditional probability and cosine similarity in the context of word2vec.

The conditional probability is define as the inner product of two word vectors, which indicate the probability of any context word given a center word.

https://preview.redd.it/s3q2i71uez981.png?width=273&format=png&auto=webp&s=b721f394adbd288d96c065a8dd882e5e25a0f1e0

If I understand correctly, that means that the conditional probability is high for "eat" given "men" because "men eat" always appear together in sentences.

However, if we calculate cosine similarity between two word vectors, we are finding similar words. E.g. "men" and "women" have high similarity score.

I am confused as the conditional probability is proportion to inner product of two word vectors, and the cosine similarity is proportion to the dot product of two vectors. But one indicates the probability of two surrounding words appear together, and one indicates words with similar semantic meaning. What am I missing? Please help.

edit:

From "Inferring complementary products from baskets and browsing sessions" by Ilya Trofimov

https://preview.redd.it/53vx61hgmz981.png?width=507&format=png&auto=webp&s=7b815fcb3a28ad32b48ebf75e3cc20e2ffb21c9f

👍︎ 14

💬︎

👤︎ u/leo_000

📅︎ Jan 06 2022

🚨︎ report

Cosine similarity Vs Jaccard index vs TFDIF

Hello, for my Masters thesis I am researching boilerplate in corporate disclosures. Specifically I want to 1. show that similarity in annual reports has been increasing over time and 2. find the cross sectional characteristics that predict the amount of boilerplate. I will be using annual reports of 1630 Nasdaq listed firms from the years 2010-2018. I purchased the textbook " Text Mining with R A Tidy Approach " by Silge and Robinson but it did not provide an answer to which method to use. Specifically, to measure similarity I'm not sure whether it would be best to use Cosine similarity or Jaccard index. A friend of mine suggested TF-DIF but I do not see how that fits within this context. Any insights are appreciated. Also if you know of any book which would be helpful for my research please let me know.

Thanks!

👍︎ 16

💬︎

👤︎ u/YoitsMclovin

📅︎ Dec 17 2021

🚨︎ report

How to Calculate Cosine Similarity in R

How to Calculate Cosine Similarity in R, The measure of similarity between two vectors in an inner product space is cosine similarity. The formula for…

https://finnstats.com/index.php/2021/08/10/how-to-calculate-cosine-similarity-in-r/

👍︎ 5

💬︎

👤︎ u/finnstat

📅︎ Aug 11 2021

🚨︎ report

Calculate cosine similarity for two images

I have the following code snippet that I want to use to calculate cosine image similarity:

import numpy
import imageio

from numpy import dot
from numpy.linalg import norm

def main():
  # imageio reads as RGB by default
  a = imageio.imread("C:/datasets/00008.jpg")
  b = imageio.imread("C:/datasets/00009.jpg")

  cos_sim = dot(a, b)/(norm(a)*norm(b))

if __name__ == "__main__":
  main()

However, the dot(a, b) function is throwing the following error:

ValueError: shapes (480,640,3) and (480,640,3) not aligned: 3 (dim 2) != 640 (dim 1)

I've tried different ways of reading the two images, including cv2 and keras.image.load but am getting the same error on those as well. Can anyone spot what I might be doing wrong?

👍︎ 2

💬︎

👤︎ u/bc_uk

📅︎ Jun 25 2021

🚨︎ report

[D] is this a correct application of the cosine similarity?

Suppose you have downloaded the pdf of every Shakespeare play on your computer. Suppose now you want to find the name of a Shakespeare play that you read in high school, but you can't remember it's name - however, you do remember the general plot of the play, e.g. "a danish prince is visted by his father's ghost and holds a skull in his hand while delivering a speech". (Btw this is the plot of hamlet)

Suppose you type this sentence in - could the cosine similarity be used to find out which play is most similar to this sentence? Is there a common way to solve this kind of problem?

👍︎ 5

💬︎

👤︎ u/SQL_beginner

📅︎ Apr 12 2021

🚨︎ report

How to perform cosine similarity when taking into account the importance of the words as well.

I have two documents or text data. Document 1 contains information like keywords with its own numeric number (which is the importance of the word):

*gre (300)    india(290)    art(278)   galleries(257) ...*

And another document that i have is the tf*idf matrix. its the extracted keywords from single document with its tf*idf score. (again, can be interpreted as the importance of the word).

function   0.6781 art        0.2463 galleries  0.15655 . . ...

so How do i compute similarities between these two document considering that the similarity between "art" from document 1 and 2 should have higher score (higher similarity) because they are more important keywords as compared to the word "galleries" from document 1 and 2 since they are less important keywords comparatively. How do i do this?

👍︎ 3

💬︎

👤︎ u/red-hooded9

📅︎ Jan 16 2021

🚨︎ report

Maximizing Cosine Similarity Between Spatial Features for Unsupervised Domain Adaptation in Semantic Segmentation by Inseop Chung et al. deepai.org/publication/ma…

👍︎ 2

💬︎

👤︎ u/deep_ai

📅︎ Mar 07 2021

🚨︎ report

[D] understanding cosine similarity?

does anyone know where can I learn more about the origin of cosine similarity?

also, where can I know the geometric explanation of it and how it was created?

I have read most of the post on the net, but I'm looking for some books or super old paper(something deep not superficial explanation) that talk about the concept behind it, and how it was created

👍︎ 2

💬︎

👤︎ u/seyeeet

📅︎ Sep 07 2020

🚨︎ report

Building a chatbot with cosine similarity and sentence embeddings (Open Source)

Hi everyone,

I recently built a simple chatbot with Google's universal sentence encoder using it as a sentence embedding and finding the best response with cosine similarity. I wrote about it a bit more details here: https://www.papercups.io/blog/chatbot I tried to simplified some of the explanation since I had trouble understanding embeddings when I first learned it

You can also play around with the chatbot https://app.papercups.io/bot/demo

with the source code for the backend https://github.com/papercups-io/papercups-simple-chatbot

the source code of the client side is https://github.com/papercups-io/papercups/blob/master/assets/src/components/demo/BotDemo.tsx

Would love any feedback!

👍︎ 3

💬︎

👤︎ u/notreallyhungryhippo

📅︎ Oct 08 2020

🚨︎ report

Cosine Similarity

I am currently working on creating a Content Based recommendation for which i am using Cosine Similarities.

I have two data frames users and items, and both have a column which has "vectors".

Now I want to compute a cosine similarity (which is implemented as function) for every user and the items (excluding the ones a user has already rated).

How should I approach this task, Create a udf or a general function to which I should pass these dfs and do processing?

As I understand it will be calculated as: First user against whole dataframe of items. Second user again against whole Df of items. And so on... I think it will be really time consuming. So is there any other approach like joining two dfs and apply udf of cosine similarity and create a new join.

Any help is appreciated. TIA

👍︎ 3

💬︎

👤︎ u/gooodboy8

📅︎ Jul 25 2020

🚨︎ report

What types of Neural Networks would usually use the Cosine Similarity Loss Function?

👍︎ 7

💬︎

👤︎ u/mateorandulfe

📅︎ Apr 01 2020

🚨︎ report

Question about BERT embeddings with high cosine similarity

Under what circumstances would BERT assign two occurrences of the same word similar embeddings? If those occurrences are contained within similar syntactic relations with their co-occurrents?

👍︎ 4

💬︎

👤︎ u/alien__instinct

📅︎ Sep 10 2020

🚨︎ report

Biobert cosine similarity

I thought Biobert was only trained on English biomedical corpus. But I run some sample codes to calculate the semantic similarities. I found that it works on Chinese, French words as well, why?

Below are the codes I run:

import torch

import argparse

import logging

from transformers import BertConfig, BertForPreTraining, load_tf_weights_in_bert

from transformers import BertTokenizer, BertModel

model_version = 'biobert_v1.1_pubmed'

do_lower_case = True

model = BertModel.from_pretrained(model_version)

tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=do_lower_case)

from sklearn.metrics.pairwise import cosine_similarity

def embed_text(text, model):

input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0) # Batch size 1

outputs = model(input_ids)

last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple

return last_hidden_states

def get_similarity(em, em2):

return cosine_similarity(em.detach().numpy(), em2.detach().numpy())

virus_em = embed_text("virus", model).mean(1)

# We will use a mean of all word embeddings.

flu_em = embed_text("flu", model).mean(1)

virus_Chinese_em = embed_text("病毒", model).mean(1)

flu_Chinese_em = embed_text("流感", model).mean(1)

print("Similarity for virus and flu:" + str(get_similarity(virus_em, flu_em)))

print("Similarity for 病毒 and 流感:" + str(get_similarity(virus_Chinese_em, flu_Chinese_em)))

print("Similarity for virus and flu:" + str(get_similarity(virus_em, flu_em)))

print("Similarity for 病毒 and 流感:" + str(get_similarity(virus_Chinese_em, flu_Chinese_em)))

The results are:

Similarity for virus and flu:[[0.9379361]]

Similarity for 病毒 and 流感:[[0.9999998]]

So the chinese words can be compared as well. Why?

👍︎ 2

💬︎

👤︎ u/qianmi2019

📅︎ Sep 09 2020

🚨︎ report

Help with Pandas and Pairwise Cosine Similarity

I'm trying to optimise my user-based collaborative filtering algorithm.

How would I generate cosine similarity between a given user and each other user in the system?

My code currently works by creating a user-user matrix where the value is the pairwise cosine similarity between the pair of users. However, this is quite inefficient since it calculates redundant pairs when it should only be calculating a given user similarity to every other user in order to identify the top n most similar neighbours for that given user.

Here is my current code:

# Calculate the pairwise similarity between every user
cosine_similarity = sklearn.metrics.pairwise.cosine_similarity(ratings_matrix_f) 
        
# Create df mapping similarity between every user, i.e. userId1 x userId2 = cos(userId1, userId2)
cosine_similarity = pd.DataFrame(cosine_similarity, index=ratings_matrix_f.index)

👍︎ 8

💬︎

👤︎ u/lurkbender

📅︎ Jan 23 2020

🚨︎ report

[D] scaled dot product vs. scaled cosine similarity in transformer attention

In the original transformer paper, it introduces scaled dot product[1]. In the recent simclr paper[2], it uses scaled cosine similarity, where it first computes cosine similarity and then scales it by τ. While both generate logits, in simclr, it says:

> Table 5 shows that without normalization and proper temperature scaling, performance is significantly worse. Without `2 normalization, the contrastive task accuracy is higher, but the resulting representation is worse under linear evaluation.

I am curious whether scaled cosine similarity will be beneficial for transformer attention. I wonder if you did experiment on this before, or have seen papers about this.

[1]: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

[2]: https://arxiv.org/pdf/2002.05709.pdf

👍︎ 3

💬︎

👤︎ u/taylorchu

📅︎ May 03 2020

🚨︎ report

Is cosine similarity differentiable?

If I have a function f(X) which is a vector of length N and corresponding labels Y also with vector length N, I can compute the cosine similarity between Y and f(X).

If the function f is a neural network and my loss function is cosine_sim(Y, f(X)) would it be differentiable? I.e. would I be able to train my neural network weights to minimise the cosine similarity loss?

👍︎ 9

💬︎

👤︎ u/mellow54

📅︎ Aug 26 2019

🚨︎ report

Is the "curse of dimensionality" in clustering valid for word embeddings using cosine similarity?

Hi,

As I wrote above I wanted to know if reducing word embeddings dimensionality is necessary and if there's some literature about it. My goal is to cluster around 100-200 semantic topics from my word embeddings.

👍︎ 3

💬︎

👤︎ u/random2819

📅︎ Feb 20 2020

🚨︎ report

MemoryError on a Cosine similarity computation

I need to do a Cosine similarity calculation (sklearn.metrics.pairwise.cosine_similarity) on a medium size matrix but I get a memory error.

> Code :

> cosine = cosine_similarity(matrixA)

> Shape of matrixA :

> 64147, 119

> My setup :

> CPU : Intel i5-6400 2.7GHz

> RAM : 8Go

> GPU : ATI Radeon HD 5400 4.2GB (500MB VRAM, 3.7MB share memory)

> Sample of matrixA :

productId 7 19 20 ... 70966 71739 71885

userId

3 0.074888 -0.149767 0.098233 ... -0.178418 0.148611 0.169102

4 0.074888 -0.149767 0.098233 ... -0.178418 0.148611 0.169102

7 0.074888 -0.149767 0.098233 ... -0.178418 0.148611 0.169102

There is no nan into the matrix before cosine.

I' m looking for explanations on how this function works and if there are solutions like alternatives or optimizations (reduce or share the matrix for block processing etc)

👍︎ 11

💬︎

👤︎ u/Eween

📅︎ Jun 19 2019

🚨︎ report

Getting Started on TF-IDF Cosine Similarity

I have been working on a company project to find duplicate accounts in our database. I have ran a script using TF-IDF and Cosine Similarity and the results came out really well.

I really want to understand this topic better and was wondering if anybody had good pointers for me.

FYI - I am using Python and Pandas.

👍︎ 2

💬︎

👤︎ u/Mmetr

📅︎ Apr 26 2020

🚨︎ report

[D] [NLP] Cosine similarity of vectors in high dimensional data (Language models)

I'm performing some semantic similarity using high dimensional language models. Within this high dimensional feature space, I can use cosine similarity to compute the similarly of two vectors. I could also use euclidean distance.

For anyone in the NLP setting, have you come across other methods to do this?

👍︎ 7

💬︎

👤︎ u/mac_cumhaill

📅︎ Mar 03 2019

🚨︎ report

Music Recommendation System with Deep Learning and Cosine Similarity

Check code on my Github built with Pytorch.

https://preview.redd.it/dt9j52f6vl151.png?width=3200&format=png&auto=webp&s=3eac98885ef4308bec56e29e50a87ec467827923

https://preview.redd.it/iyu940z8vl151.png?width=3200&format=png&auto=webp&s=4a6cd84d90853ce023625285a43f45a163c68f08

👍︎ 2

💬︎

👤︎ u/NamNguyenDuc

📅︎ May 29 2020

🚨︎ report

Cosine Similarity of Linkin Park Song Lyrics [OC]

👍︎ 42

💬︎

👤︎ u/kug3lblitz

📅︎ Feb 26 2018

🚨︎ report

How to efficiently store high dimensional (n > 200) word embedding vectors and index for fast cosine similarity search?

I am trying to store vectors for word/doc embeddings in a postgresql table, and want to be able to quickly pull the N rows with highest cosine similarity to a given query vector. The vectors I'm working with are numpy arrays of floats with length 100 <= L <= 1000.

I looked into the cube module for similarity search, but it is limited to vectors with <= 100 dimensions. The embeddings I am using will result in vectors that are 100-dimensions *minimum* and often much higher (depending on settings when training word2vec/doc2vec models).

What is the most efficient way to store large dimensional vectors (numpy float arrays) in postgres, and perform quick lookup based on cosine similarity (or other vector similarity metrics)? How does one go about building a fast index for higher-dimensional text vector data?

I'm open to probabilistic solutions too. That is, I don't necessarily need a guarantee that I'm always be getting the *most* similar item. I'd be happy to get it most of the time, and settle for something "close" sometimes.

👍︎ 4

💬︎

👤︎ u/jesswren

📅︎ Feb 22 2019

🚨︎ report

[D] Weighted Cosine Similarity

Hi everyone, I am new to ML and I am having some problems in understanding some concepts. I am trying to write an algorithm to find the similarity between two users.

Say I have m users with n properties for each user. Each property will have different weights associated with them. Greater the weight, greater the importance when predicting the final similarity. Can I achieve this using cosine similarity? If so what should be the formula to include the weights?

Any help is appreciated.

👍︎ 11

💬︎

👤︎ u/abinmn619

📅︎ Feb 08 2019

🚨︎ report

[R] Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks arxiv.org/abs/1702.05870

👍︎ 17

💬︎

👤︎ u/downtownslim

📅︎ Feb 22 2017

🚨︎ report

Using cosine similarity and your comments to find similar subreddits to /r/math

Hello Friends,

I used reddit comments dataset to find subreddits related to /r/math

Here is top 42 results:

/r/badmathematics - 0.02974409227462984
/r/IndianThings - 0.022301120746427042
/r/learnmath - 0.021969870148245467
/r/abstractartifacts - 0.018650072031422698
/r/iamveryrich - 0.01758606491223931
/r/MKaTH - 0.016653193771544856
/r/cdparunkurotaw0 - 0.013047947137022824
/r/SchizoidMath - 0.011458878117379025
/r/u_passwordmanager - 0.011150572166935265
/r/Physics - 0.010850234040324383
/r/mathriddles - 0.010702499274656878
/r/FringePhysics - 0.01035410272643989
/r/MLPSunLight - 0.01024032137779769
/r/mathbooks - 0.009480564688252403
/r/PromotingYourYoutube - 0.009025327585782188
/r/adamneelyfaces - 0.008845568451333859
/r/ImmigrationReform - 0.008492056572609475
/r/alevels - 0.008362929125201448
/r/waifuwars - 0.008347138827187232
/r/torino - 0.007383655700278307
/r/askmath - 0.0072770716109653195
/r/AnimemeEconomy - 0.007150305030124729
/r/cta - 0.00705224105231894
/r/TheGrid - 0.007027551367426381
/r/MathJokes - 0.006798523095710845
/r/raypeat - 0.006655790693614379
/r/ellory - 0.00650281553971321
/r/furthur - 0.006082130272873781
/r/Cryptark - 0.0055231998184217005
/r/CasualMath - 0.00549225024335349
/r/logh - 0.005467464041168884
/r/matrixprotocol - 0.005385436728898716
/r/cheatatmathhomework - 0.005325485884219364
/r/BitcoinTechnology - 0.005161078816989676
/r/fivethirtyeight - 0.005123066132623713
/r/numberphile - 0.004997130056441441
/r/pythoncoding - 0.004995519010650718
/r/antitelevision - 0.004986687470656877
/r/mathematics - 0.004800557420747474
/r/CollegePDFs - 0.004778816642972256
/r/Rhetoric - 0.004712270371980498
/r/ELIActually5 - 0.0046325197837907535

Results are interesting to explore, though some tiny, not very much related subs appear at the very top of the list

I wanted to pick your collective brain to see how this could be improved.

I'm using standard cosine similarity. Each row of my matrix is a user, and each column is a subreddit where each user has ever left a comment. A cell value of a matrix is user-normalzied amount of comments. I.e. if I post to /r/math twice and to /r/learnmath once, then the matrix would look lik

... keep reading on reddit ➡

👍︎ 7

💬︎

👤︎ u/anvaka

📅︎ Dec 14 2017

🚨︎ report

Cosine Similarity of the books of the Bible

I wrote a program to analyze the books of the Bible word-for-word with the classic natural language programming measure "cosine similarity." It shows what book is most similar to what other book (and clusters could indicate authors): http://techn.ology.net/bible-books-cosine-similarity/ (arc diagram chart at the end)

👍︎ 7

💬︎

👤︎ u/_ology_

📅︎ Feb 25 2018

🚨︎ report

Cosine Similarity – Understanding the math and how it works (with python) machinelearningplus.com/n…

👍︎ 11

💬︎

👤︎ u/selva86

📅︎ Oct 25 2018

🚨︎ report

[D] Why use cosine similarity for content-based addressing?

As it turns out, Neural Turing Machines and Differentiable Neural Computers both use cosine similarity for content based addressing. However, the cosine similarity does not respond to the vector magnitudes, as in:

cos_similarity([1, 0], [1, 0]) = cos_similarity([1, 0], [0.0001, 0])

Given that the query key is generated by the controller (see e.g. equation 5 of the NTM paper linked above), I could imagine the gradients running wild for small key vectors*. What I'm unsure about is whether this is necessary / intended behaviour^+ especially given that we're ignoring "what the controller is trying to say" with the query key magnitude?

One could perhaps (?) trivially modify the cosine similarity by multiplying an additional factor like:

cos_similarity(u, v) * minimum(mag(u)/mag(v), mag(v)/mag(u))

which btw reminds of the coefficient in a simple linear regression model - just symmetricised.

^* we could clamp, but that's still besides the point, imho.

^+ the insensitivity to magnitude is arguably useful in NLP applications where we're interested in (say) relative frequencies, etc.

👍︎ 21

💬︎

👤︎ u/nasimrahaman

📅︎ May 28 2017

🚨︎ report

[Q] Python: Help with Pandas and Pairwise Cosine Similarity

I'm trying to optimise my user-based collaborative filtering algorithm.

How would I generate cosine similarity between a given user and each other user in the system?

My code currently works by creating a user-user matrix where the value is the pairwise cosine similarity between the pair of users. However, this is quite inefficient since it calculates redundant pairs when it should only be calculating a given user similarity to every other user in order to identify the top n most similar neighbours for that given user.

Here is my current code:

# Calculate the pairwise similarity between every user
cosine_similarity = sklearn.metrics.pairwise.cosine_similarity(ratings_matrix_f) 
        
# Create df mapping similarity between every user, i.e. userId1 x userId2 = cos(userId1, userId2)
cosine_similarity = pd.DataFrame(cosine_similarity, index=ratings_matrix_f.index)

👍︎ 2

💬︎

👤︎ u/lurkbender

📅︎ Jan 23 2020

🚨︎ report