IS5126 Hands-on with Applied Analytics - Group Assignment
Assignment Overview
Business Context
You are a group of Data Science Consultants hired by HospitalityTech Solutions, a SaaS company providing analytics platforms to hotels. They need you to build an intelligent analytics platform that helps hotel managers:
- Understand customer satisfaction drivers
- Identify improvement opportunities
- Benchmark against competitors
- Predict future trends
- Optimize resource allocation
Your task is to develop a working product that hotel managers can use to make data-driven decisions.
Assignment 1: Data Foundation & Exploratory Analytics (15%)
Objectives
Build a foundational analytics system with:
- Efficient data storage and retrieval (SQLite database with 50,000-80,000 reviews)
- Comprehensive exploratory analysis with statistical rigor
- Performance optimization through query and code profiling
- Competitive benchmarking strategy for hotel comparison
- User-friendly dashboard for non-technical users
Deliverables
1. GitHub Repository Structure
student-name-hotel-analytics/
├── README.md # Setup and usage instructions
├── requirements.txt # Python dependencies
├── .gitignore
├── data/
│ ├── reviews_sample.db # SQLite with 5000+ sample reviews
│ └── data_schema.sql # (Optional) Schema documentation
├── notebooks/
│ ├── 01_data_preparation.ipynb
│ ├── 02_exploratory_analysis.ipynb
│ ├── 03_competitive_benchmarking.ipynb
│ └── 04_performance_profiling.ipynb
├── src/
│ ├── data_processing.py
│ ├── benchmarking.py
│ └── utils.py
├── app/
│ └── streamlit_app.py # Dashboard application
├── profiling/
│ ├── query_results.txt # Query profiling outputs
│ └── code_profiling.txt # Code profiling results
└── reports/
└── assignment1_report.pdf # 8-10 pages (excluding cover, references, appendices)
2. Data Requirements
- Timeframe: Use latest 5 years available
- Volume: at least 50,000-80,000+ reviews (after any filtering)
- Storage: SQLite database with appropriate schema design
- Sample Data: Include 5,000+ reviews in repository (for TAs to test)
Why 50K-80+K reviews?
This volume ensures that database optimization becomes meaningful and measurable. With smaller datasets, indexing and query optimization have minimal impact. Production-scale data allows you to demonstrate real performance improvements.
3. Technical Report (8-10 pages; Submit on Canvas)
Format: PDF, excluding cover page, references, and appendices
Report Header: GitHub repository URL, student name(s) and ID (start with A...)
Required Sections:
1. Executive Summary
- Business problem and solution overview
- Key findings
2. Data Foundation
- Data filtering rationale (as used)
- Schema design with ER diagram (optional)
- Indexing strategy (as used with justification)
- Data statistics showing 50K-80+K review volume
3. Exploratory Data Analysis
- Key insights with business implications (Think about business relevance)
4. Performance Profiling & Optimization
- Query Profiling
- Code Profiling
5. Competitive Benchmarking Strategy
Business Context: Hotel managers struggle to identify meaningful improvement opportunities. They ask: "Who are my real competitors? A 5-star beachfront resort shouldn't compare itself to a budget city hotel. How do we systematically identify truly comparable properties? What are similar hotels doing better than us? Where should we focus our limited improvement budget?"
- Your methodology for identifying comparable hotel groups (justify your approach)
- Performance analysis across different hotel groups
- Identification of best practices within comparable groups
- Specific, actionable recommendations for underperforming hotels
- Validation of your approach (your choice of method will decide what kind of validation can be used. Justify your approach.)
Note: This is an open-ended business problem. Propose and implement YOUR solution. There are multiple valid approaches and you need only one with proper justification.
6. System Architecture & Dashboard
- User interface with rationale
- Key features and how they address business problems
7. Conclusion (keep it short)
- Key observations, deliverables, and any limitations
- Future enhancements (your thoughts as a team, in 2-3 sentences)
For Group Submissions: Include member contribution summary table in the begining
4. Working Dashboard (Streamlit)
- Functional dashboard with web interface
- 3-5 core features that solve business problems
- User documentation in README (GitHub)
Grading Rubric (15% Total)
| Component |
Weight |
Evaluation Focus |
| Technical Report |
8% |
- Data foundation and schema quality
- Data volume compliance (50K-80K reviews)
- Exploratory analysis depth and insights
- Performance profiling with quantified improvements
- Competitive benchmarking strategy and validation
- Writing quality and clarity
|
| Working Product |
5% |
- Functionality and features (3%)
- User experience and interface (1%)
- Error handling and robustness (1%)
|
| Code Repository |
2% |
- Structure and organization (0.5%)
- Code quality and documentation (0.5%)
- Reproducibility (1.0%)
|
| TOTAL |
15% |
Assignment 2: Production System & Advanced Analytics (15%)
Enhanced Business Problems
Your foundational system was successful. Now clients want advanced capabilities. Choose and solve 3 out of these 4 problems:
Problem 1: Predictive Intelligence (attempt any 2 out of the 3 bullets below)
- Forecast hotel ratings for next 3-6 months (time-series forecasting with LSTMs/RNNs)
- Prediction which reviews will be most helpful to customers (classification; Compare the techniques used to select the best one for client usage)
- Predict overall rating from aspect ratings (Numeric value prediction; Compare the techniques used to select the best one for client usage)
Problem 2: Text Intelligence & Review Analytics (attempt any 3 out of the 5 bullets below)
- Automatically extract key themes and topics from review text (topic modeling, LDA, BERTopic)
- Identify specific aspects customers mention most (aspect extraction, NER)
- Predict review helpfulness based on text content (NLP + classification)
- Analyze which specific phrases correlate with high/low ratings
- (Advanced/Optional) Automated review summarization or periodic reporting to management
Problem 3: Intelligent Automation (attempt any 3 out of the 4 bullets below)
- Identifying trends (upwards, downwards) in performance of hotels in any given time-line
- Highlight emerging issues (Performance drift detection and alerts, from trend monitoring above)
- Recommend cross learning opportunities from high performers to low performers (for comaprable hotels)
- Real-time review analysis system (streaming ML pipeline)
Problem 4: Knowledge Graphs & Executive QA System (attempt any 2 out of the 3 bullets below)
- Build knowledge graph from hotel reviews (entities: hotels, aspects, sentiments, locations) [do think of a strategy before doing this...]
- RAG-based QA system for CEO queries (e.g., "What are guests saying about our breakfast service compared to competitors?")
- Natural language interface for executives to query
Deliverables
1. Updated GitHub Repository
student-name-hotel-analytics/
├── notebooks/
│ ├── 05_predictive_modeling.ipynb # Regression, Classification
│ ├── 06_deep_learning.ipynb # Neural networks
│ ├── 07_time_series_forecasting.ipynb # (if applicable)
│ ├── 08_knowledge_graph.ipynb # (if applicable)
│ └── 09_optimization.ipynb # (if applicable)
├── models/
│ ├── neural_net.pt # Saved models
│ ├── lstm_forecaster.pt # (if applicable)
│ └── model_metadata.json
├── api/
│ ├── main.py # API service (optional)
│ └── routes/
├── app/
│ └── streamlit_app.py # Enhanced dashboard
├── tests/
│ └── test_models.py # (Optional)
└── reports/
└── assignment2_report.pdf # 8-10 pages
2. Technical Report (8-10 pages)
Format: PDF, excluding cover page, references, and appendices
Report Header: GitHub repository URL, student name(s) and ID(s)
Required Sections:
1. Executive Summary
- Problems addressed (which 3 of 4?)
- Technical approach summary
- Business impact quantification
2. Advanced Analytics Implementation
For each of the 3 problems solved:
- Business Justification: Why this problem matters
- Technical Approach: Method chosen and why
- Architecture: System design and components
- Implementation: Key technical decisions
- Evaluation: Performance metrics and results
- Business Impact: How it creates value
3. System Architecture
- Overall system design and component interactions
- Data flow
- API design (if applicable)
- Deployment strategy(if applicable)
4. Business Impact Analysis
- Realistic usage scenario and workflow
- Quantified business value with calculations
- ROI analysis
5. Conclusion
- Technical achievements and challenges
- System limitations
For Group Submissions: Include member contribution summary table at end
3. Production System
- Deployable application (not just notebooks)
- Advanced features integrated
- Comprehensive error handling
- Production-quality code
Grading Rubric (15% Total)
| Component |
Weight |
Evaluation Focus |
| Technical Report |
8% |
- Advanced analytics quality (3 problems mandatory)
- Appropriate use of deep learning/advanced ML
- Model evaluation rigor
- Business problem-solution fit
- Quantified business impact (mandatory)
- Writing quality and clarity
|
| Working Product |
5% |
- Advanced features functionality (3%)
- System architecture quality (1%)
- Production readiness (1%)
|
| Code Repository |
2% |
- Structure and organization (0.5%)
- Code quality and documentation (0.5%)
- Reproducibility (1.0%)
|
| TOTAL |
15% |
Technical Requirements
Data & Tools
- Timeframe: Latest 5 years
- Volume: 50,000-80,000 reviews after filtering
- Storage: SQLite database
- Languages: Python 3.8+
- Core Libraries: pandas, numpy, scikit-learn, matplotlib/plotly/seaborn
- Assignment 2 Additional: PyTorch or TensorFlow (for advanced models)
Version Control
- Platform: GitHub (public or provide TA access)
- Documentation: Comprehensive README with setup instructions
- Reproducibility: All notebooks must run without errors
Submission Instructions
See Canvas for detailed submission instructions and deadlines
General requirements:
- Submit ZIP file named:
GroupName_AssignmentX.zip
- PDF report should clearly list out all group members with their IDs (start with A...)
- ZIP must contain: Technical report PDF with GitHub URL on cover page
- For groups: Include member contribution summary in report (after cover page)
- GitHub repository must be accessible with sample database
Evaluation Philosophy
We Grade Based On:
- Problem-Solution Fit: Does your approach match business needs?
- Technical Soundness: Is implementation correct and robust?
- Practical Value: Can hotels actually use this?
- Learning Demonstration: Do you understand what you built?
We DON'T Penalize:
- Different technical choices (if justified)
- Simpler approaches (if effective)
- Not using every technique taught (or out there)
- UI/UX choices
Academic Integrity
- All work must be your team's original contribution
- Properly cite any adapted code or ideas from external sources
- AI tools (ChatGPT, Copilot) allowed for assistance, but you must understand and be able to explain all submitted work. Also, declare the usage as briefed in day-1 'Course Introduction' slides
- Plagiarism results in zero marks and academic misconduct proceedings
- Groups may discuss approaches but must implement independently
Getting Help
Use Canvas to discuss questions with classmates, TAs/Instructor. Post questions on the course forum.
Build something you're proud to showcase in Job interviews!
Last Updated: January 2026
Version: 2.0