To be able to use hash tables in a program, and be familiar with the idea of sentiment analysis
Sentiment analysis is a simple form of natural language processing in which we try to determine if a piece of text is positive, negative or just neutral. There are many techniques for performing sentiment analysis. In this assignment we will use a simple approach.
We will do our analysis by assigning sentiment ratings to individual words and phrases. The scale will be from -5 (very negative) to 5 (very positive). We will fill up a hash table where the keys are the words or phrases and the values are the sentiments. For instance the key "crap" produces the value -3 while the key "awesome" maps to the value 4.
Some multi-word phrases will go in the table too. For example, the words "fed" and "up" are pretty neutral, but the phrase "fed up" together has a negative rating. All of our phrases are either a single word, or two words. There are none longer than that.
We will use the sentiments.txt file to fill up our hash table. This data file was adapted from this source. The file has 2,454 words in it.
One flaw in our hash table implementation is that once the size is set, it cannot be expanded. Remember that for the table to be efficient, it must be quite a bit bigger than the data being stored (to minimize collisions).
As part of this assignment, you'll fix this issue by detecting when the table is half full, and then doubling the size. This will ensure the hash table never gets more than half full.
Unfortunately, we can't simply double the size of the table and carry on. We have to rehash all of the existing entries. The reason is that the table size is part of the hash table calculation, when we modulus by the table size. So if an item was hashed to index 131, but our table was only size 100, then we would actually put it in slot 31. Then if we double the table size to 200, we would start looking for it in slot 131, but it won't be there.
Doing this will involve the following:
expand
.Now we can use our more capable hash table implementation to build the sentiment analysis program. To do this, do the following:
$ java SentimentAnalysis Enter text: The Mexican restaurant downtown is in a charming location and has a nice menu of delicious fare. They serve homemade tortillas that are soft and tasty, and the soup is incredible. END Words: 31 Sentiment: 19 Overall: 0.61
$ java SentimentAnalysis Enter text: Game of Thrones final season was atrocious. Fans largely hated where the plot went and found the writing disappointing, with characters making decisions that make no sense. END Words: 27 Sentiment: -12 Overall: -0.44
In this second example, note that the phrase "no sense" has to be matched!
When you are done, submit your code for this program on Canvas.
Copyright © 2024 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.