Home CPSC 340

Assignment 3 - Simple Sentiment Analysis


 

Objective

To be able to use hash tables in a program, and be familiar with the idea of sentiment analysis


 

Task

Sentiment analysis is a simple form of natural language processing in which we try to determine if a piece of text is positive, negative or just neutral. There are many techniques for performing sentiment analysis. In this assignment we will use a simple approach.

We will do our analysis by assigning sentiment ratings to individual words and phrases. The scale will be from -5 (very negative) to 5 (very positive). We will fill up a hash table where the keys are the words or phrases and the values are the sentiments. For instance the key "crap" produces the value -3 while the key "awesome" maps to the value 4.

Some multi-word phrases will go in the table too. For example, the words "fed" and "up" are pretty neutral, but the phrase "fed up" together has a negative rating. All of our phrases are either a single word, or two words. There are none longer than that.

We will use the sentiments.txt file to fill up our hash table. This data file was adapted from this source. The file has 2,454 words in it.


 

Part 1: Expanding a Hash Table

One flaw in our hash table implementation is that once the size is set, it cannot be expanded. Remember that for the table to be efficient, it must be quite a bit bigger than the data being stored (to minimize collisions).

As part of this assignment, you'll fix this issue by detecting when the table is half full, and then doubling the size. This will ensure the hash table never gets more than half full.

Unfortunately, we can't simply double the size of the table and carry on. We have to rehash all of the existing entries. The reason is that the table size is part of the hash table calculation, when we modulus by the table size. So if an item was hashed to index 131, but our table was only size 100, then we would actually put it in slot 31. Then if we double the table size to 200, we would start looking for it in slot 131, but it won't be there.

Doing this will involve the following:

  1. Start by downloading HashTable.java which is the HashTable class we developed this week. You will use this as the basis for this assignment.
  2. Next, add a variable to keep track of how many entries are actually in the table. This should be set to 0 in the constructor and incremented each time the insert method is called.
  3. Add code to the top of insert to check if the number of items currently in the table is more than half of the max size. If it is, call a method you will write called expand.
  4. In the expand method (which can be private), start by creating new arrays for the keys and values which are twice as big as the current ones.
  5. Next, loop through the existing items in the table. For each item, use the key to calculate a new index (using the doubled size for your modulus). Then stick the key and value into the new tables at the index you got.
  6. Finally, point the array references to the new, larger arrays.

 

Part 2: Sentiment Analysis

Now we can use our more capable hash table implementation to build the sentiment analysis program. To do this, do the following:

  1. Make a HashTable mapping Strings to Integers. In order to make sure your expanding code works, set it to a small initial size, like 50.
  2. Write code to load the data file into a hash table. You can assume the "sentiments.txt" file will be in the same directory as your program.
  3. Next, get user input from the keyboard. You should keep reading input until the user types the word "END".
  4. Remove all punctuation from the words you read and convert them to lower-case. This will allow it to match words in the input file more easily.
  5. Keep track of the total sentiment of the input. For each word, check if it's in the hash table. If so, add the sentiment into the running total.
  6. You will also have to check for the two-word phrases. To do this, just keep track of the previous word you read too.
  7. After seeing all the input, print out the number of words, the total sentiment, and the average sentiment of the text (with two decimal places).

 

Example Runs

$ java SentimentAnalysis
Enter text:
The Mexican restaurant downtown is in a charming location and has a nice menu
of delicious fare.  They serve homemade tortillas that are soft and tasty, and
the soup is incredible.
END 

Words: 31
Sentiment: 19
Overall: 0.61
$ java SentimentAnalysis
Enter text:
Game of Thrones final season was atrocious.  Fans largely hated where the plot
went and found the writing disappointing, with characters making decisions that
make no sense.
END

Words: 27
Sentiment: -12
Overall: -0.44

In this second example, note that the phrase "no sense" has to be matched!


 

General Requirements


 

Submitting

When you are done, submit your code for this program on Canvas.

Copyright © 2024 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.