Word Probability Calculator

A cross-language text analysis tool implementing word probability calculation with stopword filtering in Java, Python, and JavaScript for comparative study.

Source Code

The Problem

Understanding how different programming languages handle the same algorithm reveals their strengths, weaknesses, and idiomatic patterns. Text analysis with word probability calculation is an ideal benchmark because it involves file I/O, string manipulation, data structures, and mathematical computation — areas where languages diverge significantly.

The Approach

I implemented identical functionality in Java, Python, and JavaScript: reading multiple text files, filtering 850+ stopwords, computing word frequencies, and ranking the top 5 words by probability. Each implementation follows its language's idiomatic patterns rather than being a direct translation, revealing how language design influences code structure.

Technical Details

Python uses NLTK for tokenization with dictionary comprehensions for frequency counting. Java uses HashMap/HashSet with iterative processing and Stream API for sorting. JavaScript uses a functional pipeline with Array.map, filter, and reduce. All three share the same stopword list and input files for consistent comparison. The probability calculation normalizes word counts against total non-stopword tokens.

The Outcome

Completed a side-by-side comparison of three language implementations producing identical results. The exercise demonstrated that Python excels for rapid NLP prototyping, Java catches the most errors at compile time through static typing, and JavaScript offers the most concise code through functional patterns. This comparative understanding informs my language selection decisions for new projects.

Tech Stack

Java Python JavaScript NLTK

Related Projects

Pneumonia X-Ray Classification

A deep learning comparative study using Simple NN, CNN, and Residual CNN architectures to classify chest X-rays as Normal or Pneumonia with TensorFlow and Keras.

Zelda RPG

A Python Pygame 2D action RPG inspired by Legend of Zelda, featuring real-time combat with 5 weapons, magic spells, 4 enemy types, and character progression.

Recommendation System

An information retrieval system that recommends anime and manga using TF-IDF vector similarity, query spell correction, inverted indices, and user feedback refinement.

Explore more projects

Back to Projects