IS5126 Hands-on with Applied Analytics - Group Assignment

Course: IS5126 Hands-on with Applied Analytics

Academic Year: 2025/2026 Sem 2

Assignment Structure: 2-Part Assignment

Total Weight: 30% of course grade (Assignment 1: 15% + Assignment 2: 15%)

Dataset: Hotel Reviews (50,000-80,000+ reviews)

Assignment Overview

Business Context

You are a group of Data Science Consultants hired by HospitalityTech Solutions, a SaaS company providing analytics platforms to hotels. They need you to build an intelligent analytics platform that helps hotel managers:

Understand customer satisfaction drivers
Identify improvement opportunities
Benchmark against competitors
Predict future trends
Optimize resource allocation

Your task is to develop a working product that hotel managers can use to make data-driven decisions.

Assignment 1: Data Foundation & Exploratory Analytics (15%)

Duration: Weeks 2-6

Submission Deadline: See Canvas

Weight: 15% of final course grade

Objectives

Build a foundational analytics system with:

Efficient data storage and retrieval (SQLite database with 50,000-80,000 reviews)
Comprehensive exploratory analysis with statistical rigor
Performance optimization through query and code profiling
Competitive benchmarking strategy for hotel comparison
User-friendly dashboard for non-technical users

Deliverables

1. GitHub Repository Structure

student-name-hotel-analytics/
├── README.md                   # Setup and usage instructions
├── requirements.txt            # Python dependencies
├── .gitignore                  
├── data/
│   ├── reviews_sample.db       # SQLite with 5000+ sample reviews
│   └── data_schema.sql         # (Optional) Schema documentation
├── notebooks/
│   ├── 01_data_preparation.ipynb
│   ├── 02_exploratory_analysis.ipynb
│   ├── 03_competitive_benchmarking.ipynb
│   └── 04_performance_profiling.ipynb
├── src/
│   ├── data_processing.py
│   ├── benchmarking.py
│   └── utils.py
├── app/
│   └── streamlit_app.py        # Dashboard application
├── profiling/
│   ├── query_results.txt       # Query profiling outputs
│   └── code_profiling.txt      # Code profiling results
└── reports/
    └── assignment1_report.pdf  # 8-10 pages (excluding cover, references, appendices)

2. Data Requirements

Timeframe: Use latest 5 years available
Volume: at least 50,000-80,000+ reviews (after any filtering)
Storage: SQLite database with appropriate schema design
Sample Data: Include 5,000+ reviews in repository (for TAs to test)

Why 50K-80+K reviews?

This volume ensures that database optimization becomes meaningful and measurable. With smaller datasets, indexing and query optimization have minimal impact. Production-scale data allows you to demonstrate real performance improvements.

3. Technical Report (8-10 pages; Submit on Canvas)

Format: PDF, excluding cover page, references, and appendices

Report Header: GitHub repository URL, student name(s) and ID (start with A...)

Required Sections:

1. Executive Summary

Business problem and solution overview
Key findings

2. Data Foundation

Data filtering rationale (as used)
Schema design with ER diagram (optional)
Indexing strategy (as used with justification)
Data statistics showing 50K-80+K review volume

3. Exploratory Data Analysis

Key insights with business implications (Think about business relevance)

4. Performance Profiling & Optimization

Query Profiling
Code Profiling

5. Competitive Benchmarking Strategy

Business Context: Hotel managers struggle to identify meaningful improvement opportunities. They ask: "Who are my real competitors? A 5-star beachfront resort shouldn't compare itself to a budget city hotel. How do we systematically identify truly comparable properties? What are similar hotels doing better than us? Where should we focus our limited improvement budget?"

Your methodology for identifying comparable hotel groups (justify your approach)

Performance analysis across different hotel groups

Identification of best practices within comparable groups

Specific, actionable recommendations for underperforming hotels

Validation of your approach (your choice of method will decide what kind of validation can be used. Justify your approach.)

Note: This is an open-ended business problem. Propose and implement YOUR solution. There are multiple valid approaches and you need only one with proper justification.

6. System Architecture & Dashboard

User interface with rationale

Key features and how they address business problems

7. Conclusion (keep it short)

Key observations, deliverables, and any limitations

Future enhancements (your thoughts as a team, in 2-3 sentences)

For Group Submissions: Include member contribution summary table in the begining

4. Working Dashboard (Streamlit)

Functional dashboard with web interface

3-5 core features that solve business problems

User documentation in README (GitHub)

Grading Rubric (15% Total)

Component Weight Evaluation Focus

Technical Report 8% - Data foundation and schema quality
- Data volume compliance (50K-80K reviews)
- Exploratory analysis depth and insights
- Performance profiling with quantified improvements
- Competitive benchmarking strategy and validation
- Writing quality and clarity

Working Product 5% - Functionality and features (3%)
- User experience and interface (1%)
- Error handling and robustness (1%)

Code Repository 2% - Structure and organization (0.5%)
- Code quality and documentation (0.5%)
- Reproducibility (1.0%)

TOTAL 15%

Assignment 2: Production System & Advanced Analytics (15%)

Duration: Weeks 7-12

Submission Deadline: See Canvas

Weight: 15% of final course grade

Enhanced Business Problems

Your foundational system was successful. Now clients want advanced capabilities. Choose and solve 3 out of these 4 problems:

Problem 1: Predictive Intelligence (attempt any 2 out of the 3 bullets below)

Forecast hotel ratings for next 3-6 months (time-series forecasting with LSTMs/RNNs)

Prediction which reviews will be most helpful to customers (classification; Compare the techniques used to select the best one for client usage)

Predict overall rating from aspect ratings (Numeric value prediction; Compare the techniques used to select the best one for client usage)

Problem 2: Text Intelligence & Review Analytics (attempt any 3 out of the 5 bullets below)

Automatically extract key themes and topics from review text (topic modeling, LDA, BERTopic)

Identify specific aspects customers mention most (aspect extraction, NER)

Predict review helpfulness based on text content (NLP + classification)

Analyze which specific phrases correlate with high/low ratings

(Advanced/Optional) Automated review summarization or periodic reporting to management

Problem 3: Intelligent Automation (attempt any 3 out of the 4 bullets below)

Identifying trends (upwards, downwards) in performance of hotels in any given time-line

Highlight emerging issues (Performance drift detection and alerts, from trend monitoring above)

Recommend cross learning opportunities from high performers to low performers (for comaprable hotels)

Real-time review analysis system (streaming ML pipeline)

Problem 4: Knowledge Graphs & Executive QA System (attempt any 2 out of the 3 bullets below)

Build knowledge graph from hotel reviews (entities: hotels, aspects, sentiments, locations) [do think of a strategy before doing this...]

RAG-based QA system for CEO queries (e.g., "What are guests saying about our breakfast service compared to competitors?")

Natural language interface for executives to query

Deliverables

1. Updated GitHub Repository

student-name-hotel-analytics/ ├── notebooks/ │ ├── 05_predictive_modeling.ipynb # Regression, Classification │ ├── 06_deep_learning.ipynb # Neural networks │ ├── 07_time_series_forecasting.ipynb # (if applicable) │ ├── 08_knowledge_graph.ipynb # (if applicable) │ └── 09_optimization.ipynb # (if applicable) ├── models/ │ ├── neural_net.pt # Saved models │ ├── lstm_forecaster.pt # (if applicable) │ └── model_metadata.json ├── api/ │ ├── main.py # API service (optional) │ └── routes/ ├── app/ │ └── streamlit_app.py # Enhanced dashboard ├── tests/ │ └── test_models.py # (Optional) └── reports/ └── assignment2_report.pdf # 8-10 pages

2. Technical Report (8-10 pages)

Format: PDF, excluding cover page, references, and appendices

Report Header: GitHub repository URL, student name(s) and ID(s)

Required Sections:

1. Executive Summary

Problems addressed (which 3 of 4?)

Technical approach summary

Business impact quantification

2. Advanced Analytics Implementation

For each of the 3 problems solved:

Business Justification: Why this problem matters

Technical Approach: Method chosen and why

Architecture: System design and components

Implementation: Key technical decisions

Evaluation: Performance metrics and results

Business Impact: How it creates value

3. System Architecture

Overall system design and component interactions

Data flow

API design (if applicable)

Deployment strategy(if applicable)

4. Business Impact Analysis

Realistic usage scenario and workflow

Quantified business value with calculations

ROI analysis

5. Conclusion

Technical achievements and challenges

System limitations

For Group Submissions: Include member contribution summary table at end

3. Production System

Deployable application (not just notebooks)

Advanced features integrated

Comprehensive error handling

Production-quality code

Grading Rubric (15% Total)

Component Weight Evaluation Focus

Technical Report 8% - Advanced analytics quality (3 problems mandatory)
- Appropriate use of deep learning/advanced ML
- Model evaluation rigor
- Business problem-solution fit
- Quantified business impact (mandatory)
- Writing quality and clarity

Working Product 5% - Advanced features functionality (3%)
- System architecture quality (1%)
- Production readiness (1%)

Code Repository 2% - Structure and organization (0.5%)
- Code quality and documentation (0.5%)
- Reproducibility (1.0%)

TOTAL 15%

Technical Requirements

Data & Tools

Timeframe: Latest 5 years

Volume: 50,000-80,000 reviews after filtering

Storage: SQLite database

Languages: Python 3.8+

Core Libraries: pandas, numpy, scikit-learn, matplotlib/plotly/seaborn

Assignment 2 Additional: PyTorch or TensorFlow (for advanced models)

Version Control

Platform: GitHub (public or provide TA access)

Documentation: Comprehensive README with setup instructions

Reproducibility: All notebooks must run without errors

Submission Instructions

See Canvas for detailed submission instructions and deadlines

General requirements:

Submit ZIP file named: GroupName_AssignmentX.zip

PDF report should clearly list out all group members with their IDs (start with A...)

ZIP must contain: Technical report PDF with GitHub URL on cover page

For groups: Include member contribution summary in report (after cover page)

GitHub repository must be accessible with sample database

Evaluation Philosophy

We Grade Based On:

Problem-Solution Fit: Does your approach match business needs?

Technical Soundness: Is implementation correct and robust?

Practical Value: Can hotels actually use this?

Learning Demonstration: Do you understand what you built?

We DON'T Penalize:

Different technical choices (if justified)

Simpler approaches (if effective)

Not using every technique taught (or out there)

UI/UX choices

Academic Integrity

All work must be your team's original contribution

Properly cite any adapted code or ideas from external sources

AI tools (ChatGPT, Copilot) allowed for assistance, but you must understand and be able to explain all submitted work. Also, declare the usage as briefed in day-1 'Course Introduction' slides

Plagiarism results in zero marks and academic misconduct proceedings

Groups may discuss approaches but must implement independently

Getting Help

Use Canvas to discuss questions with classmates, TAs/Instructor. Post questions on the course forum.

Build something you're proud to showcase in Job interviews!

Last Updated: January 2026
Version: 2.0

Component	Weight	Evaluation Focus
Technical Report	8%	- Data foundation and schema quality - Data volume compliance (50K-80K reviews) - Exploratory analysis depth and insights - Performance profiling with quantified improvements - Competitive benchmarking strategy and validation - Writing quality and clarity
Working Product	5%	- Functionality and features (3%) - User experience and interface (1%) - Error handling and robustness (1%)
Code Repository	2%	- Structure and organization (0.5%) - Code quality and documentation (0.5%) - Reproducibility (1.0%)
TOTAL		15%

Component	Weight	Evaluation Focus
Technical Report	8%	- Advanced analytics quality (3 problems mandatory) - Appropriate use of deep learning/advanced ML - Model evaluation rigor - Business problem-solution fit - Quantified business impact (mandatory) - Writing quality and clarity
Working Product	5%	- Advanced features functionality (3%) - System architecture quality (1%) - Production readiness (1%)
Code Repository	2%	- Structure and organization (0.5%) - Code quality and documentation (0.5%) - Reproducibility (1.0%)
TOTAL		15%