CSL862: Assignment 3 on Map-Reduce

Assignment

  1. Generate at least 50K random sentences of max length 140 characters from a set of 20-30 words
  2. Find all sets of sentences that are 90% similar to each other, i.e. 90% of the words match (Use Hadoop on AWS)

Note:

  1. To be done in groups of two or three.
  2. The last date of submission is Nov 7