Fiction and News Comparison

I am exploring fiction in the 2000s and news in the 2000s and how their word choice is similar and different. I got the files used in this project from here

N-gram of size 2
Fiction N-Grams(size 2)
N-gram of size 2
News N-Grams(size 2)

The commonly used N-Grams in Fiction during the 2000s were of the, in the, to the, on the, and the, it was, and at the. Additonally, the commonly used news N-Grams were of the, in the, to the, and the,and for the. Both works of texts fiction and news used of the, in the, and to the. These were the most common similarites between the two. Also, I noticed the phrase it was was used alot in fiction, but did not show up in the news.

number of words
Common words in fiction.
number of words
Common words in the news.

The word said is very common between the two types of text. I believe this is mostly from characters talking and the news saying who said what. The words like captain and dial are specific to the fiction texts and it makes sense why it does not show up in the text. The name captain dial shows up many times in the selected fiction texts. He seems to be a main character in these texts. The words like pages, government, police, american, and report are specfic to the new's texts. These words are common because they talk about problems throughout the world. For example the word report could be a news report on something along with the government. The word pages could indicate a source that the news is citing in it claims.

fiction kwic
The use of "of the" in fiction.
news kwic
The use of "of the" in the news.

The word "of the" in the news and fiction is one of the top N-Grams with a size of 2. The use of of the in fiction is followed by the words other, people, white, family, state, hill, old, sea, and new. The use of of the in the news is commonly followed by city, American, biggest, late, class, day, first, elders, Norcross, past, and lowest. Notice the words in fiction that follow of the is mainly followed by a descriptive word like the "white sand" and "family business". Also, the words in fiction that come after of the are words like "biggest", "American", "lowest", and "city". These words are less descriptive words and more like news titles, or attention grabbers, of new like the "biggest" and "lowest".

fiction kwic
Fiction N-Gram(size 3).
news kwic
News N-Gram(size 3).

The common N-Grams of size 3 in both texts are don't, did'nt, and other simple word phraes. The words fraser outermost ring, captain dial, and the old man are all fiction specfic words. I beleive these words to be specific to the story. However, in the new's text words like its a, its not, and its the show up a lot in the news, but barely shows up in the fiction text. I wonder why it shows up more in the news. The fiction and news text both have some very common words like said to show what people are saying. Also, they have serveral words that are specific to each text. If I were to continue this text analysis I would look more into the common phrases that they share.