Welcome To Snake_Bytes

Snake_Bytes
I would like to welcome everyone to the first installment of what we are calling Snake_Bytes just a little nibble of some #Python snippets.  Various developers from @PokitDokDev will be contributing on a weekly basis just to give you added venom in your coding skills.  So let's begin with the first one:

I am continually asked how one gets started with #data-science other than putting a hashtag in a Twitter stream or getting a recommendation on it via LinkedIn.  Almost all operations within machine learning start with the Dot Product or Scalar Product.

In vector calculus the dot product of two vectors in R is defined to be a number, for example if you have:

two vectors A = [A1, A2, ..., An] and B = [B1, B2, ..., Bn] is defined as:

{\displaystyle \mathbf {A} \cdot \mathbf {B} =\sum _{i=1}^{n}A_{i}B_{i}=A_{1}B_{1}+A_{2}B_{2}+\cdots +A_{n}B_{n}}

where Σ denotes summation notation and n is the dimension of the vector space. For instance, in three-dimensional space, the dot product of vectors [1, 3, −5] and [4, −2, −1] is:

{\begin{aligned}\ [1,3,-5]\cdot [4,-2,-1]&=(1)(4)+(3)(-2)+(-5)(-1)\\&=4-6+5\\&=3.\end{aligned}}

There is also a Geometric whereas in Euclidean space, a Euclidean vector is a geometrical object that possesses both a magnitude and a direction. A vector can be pictured as an arrow. Its magnitude is its length, and its direction is the direction that the arrow points. The magnitude of a vector A is denoted by \left\|\mathbf {A} \right\|. The dot product of two Euclidean vectors and B is defined by[

{\displaystyle \mathbf {A} \cdot \mathbf {B} =\|\mathbf {A} \|\ \|\mathbf {B} \|\cos(\theta ),}

where θ is the angle between A and B.

In particular, if A and B are orthogonal, then the angle between them is 90° and

\mathbf {A} \cdot \mathbf {B} =0.

At the other extreme, if they are codirectional, then the angle between them is 0° and

\mathbf {A} \cdot \mathbf {B} =\left\|\mathbf {A} \right\|\,\left\|\mathbf {B} \right\|

This implies that the dot product of a vector A by itself is

\mathbf {A} \cdot \mathbf {A} =\left\|\mathbf {A} \right\|^{2},

which gives

\left\|\mathbf {A} \right\|={\sqrt {\mathbf {A} \cdot \mathbf {A} }},

which is the formula for the Euclidean length of the vector.

So what does this actually do for me? I think its time to code something already!

There are several different ways to approach the coding. Actually it is really simple:

or

The advantage in the latter using the zip() is that it does the lifting for you and makes an iterator that aggregates elements from each of the iterables.

The dot product can be used in several applications ranging from audio, text, and similarity to graphics rendering.  In fact, many machine learning algorithms can be expressed entirely in terms of dot products.

So there is the first Snake_Byte.

Hopefully it wasn't too painful.

@tctjr

About Ted Tanner

Ted Tanner is co-founder and CTO of PokitDok. Ted is an engineering executive with extensive experience ranging from startups to public corporations. Focused mainly on growth scale computing he has held architect positions at both Apple and Microsoft and has held instrumental roles in several start-ups, including digidesign (IPO and acquired by Avid), Crystal River Engineering (acquired by Creative Labs), VP of R&D at MongoMusic (acquired by Microsoft) and Co-founder and CTO of BeliefNetworks (acquired by Benefitfocus). He was also the CTO of Spatializer Audio Labs (NASDAQ: SPAZ), a company specializing in digital signal processing solutions. He is on the IAB for the University of South Carolina Computer Science Department as well as the Center for Intelligent Systems and Machine Learning at the University of Tennessee. Mr. Tanner has published numerous articles in leading technical magazines and holds several patents in the areas of semantics, machine learning, signal processing and signal protection.

View All Posts
    • Betsy Dalton on September 9, 2016 at 1:45 pm

    Reply

    I need some help wrapping my head around this matrix calculus business. Adding two vectors makes sense to me, especially when I consider the operation in a two- or three-dimensional world. Cool. But the dot product. Where's that get me? So, consider this explanation I found on the google.

    Let's start simple, and treat 3 x 4 as a dot product:
    (3, 0) * (4,0)
    The number 3 is "directional growth" in a single dimension (the x-axis, let's say), and 4 is "directional growth" in that same direction. 3 x 4 = 12 means we get 12x growth in a single dimension. Ok.
    Now, suppose 3 and 4 refer to different dimension. Let's say 3 means "triple your bananas" (x-axis) and 4 means "quadruple your oranges" (y-axis). Now they're not the same type of number: what happens when apply growth (use the dot product) in our "bananas, oranges" universe?
    (3,0) means "Triple your bananas, destroy your oranges"
    (0,4) means "Destroy your bananas, quadruple your oranges"
    Applying (0,4) to (3,0) means "Destroy your banana growth, quadruple your orange growth". But (3, 0) had no orange growth to begin with, so the net result is 0 ("Destroy all your fruit, buddy").
    (3, 0) * (0, 4) = 0

    Now I'm really lost. Help?

      • Ted Tanner on September 9, 2016 at 2:30 pm
      • Author

      Reply

      Dear Ms. Dalton:

      We appreciate your interest in our blog. The dot product allows you to functionally traverse vector spaces when you want to find the similarity between say audio samples or text it also useful in backface culling, lighting and collision detection for 3-D graphics. To be honest I did not know it allowed you to destruct fruit. I will have to add that to the bag of algorithms.

        • Betsy Dalton on September 9, 2016 at 4:00 pm

        Reply

        oh.

    • wb on September 10, 2016 at 1:20 am

    Reply

    The key to understanding the geometric interpretation of an inner product, Ms. Dalton, is to think of vectors as defining the locations of *points* in space, rather than arrows. This allows us to more intuitively interpret the cosine of the angle between any two points (or objects), A and B, as a measure of the "similarity" between those objects. This "cosine similarity" can be easily computed as "||A|| dot ||B||", per Ted's post.

    We will be exploring in subsequent posts how inner products and other mathematical tools allow us to make and use models in very high dimensional vector spaces.

Leave a Reply

Your email address will not be published.