A Streamlit app that ingests messy transactions, cleans them, and visualizes spend so people make better money choices.
Tools: Python • Streamlit • Pandas • PlotlyRole: Solo builderUsers: Friends managing personal and small-business expenses
Problem
People track spending across banks and formats. Importing and cleaning that data is tedious and error-prone, which means insights arrive too late to change behavior.
Solution
I built a Streamlit app that imports CSV/Excel/JSON/PDF, standardizes columns, de-duplicates, and categorizes transactions with Pandas. Plotly charts highlight trends, categories, and outliers; the UI focuses on quick filters and summary KPIs.
Impact
Early users consolidated months of transactions in minutes and identified 2–3 categories to cut immediately. Duplicate detection reduced manual fixes and improved confidence in the numbers.
Role & Context
Solo project completed over ~3 weeks. Tested with friends who manage personal and small-business expenses. Tech: Python, Streamlit, Pandas, Plotly.
KEY FEATURES
Multi-Format Import: Supports CSV, Excel, JSON, TXT, Parquet, PDF, and ZIP formats
Automated Data Cleaning: Fuzzy matching for column mapping and data standardization
Interactive Dashboards: Visualizations for income vs. expenses, category breakdowns, and trends
Anomaly Detection: Identifies duplicate and anomalous transactions automatically
User Authentication: Secure user management with password hashing
Admin Panel: Comprehensive admin interface for user and data management
Data Profiling: Automated data quality reports using YData-Profiling and Sweetviz
Export Capabilities: Export cleaned data and reports in multiple formats
RESULTS AND INSIGHTS
Identified top spending categories and peak expense periods
Detected duplicate and anomalous transactions, reducing data errors
Visualized monthly savings trends, highlighting opportunities for budget optimization
Provided interactive dashboards for income vs. expenses analysis
Generated category breakdowns through pie charts and time series plots
SKILLS DEMONSTRATED
Data CleaningData PreprocessingFuzzy MatchingMulti-Format Data ImportData ProfilingAnomaly DetectionTime Series AnalysisData VisualizationInteractive DashboardsUser AuthenticationPDF Data ExtractionStreamlit DevelopmentData ValidationFeature Engineering
Challenge: Data quality issues with inconsistent formats and missing values in user-uploaded files
Solution: Implemented automated column mapping with fuzzy matching and robust data validation to handle diverse file formats and missing data.
Challenge: PDF extraction variability due to different bank statement layouts
Solution: Used multiple PDF extraction libraries (PDFPlumber and Tabula-py) with fallback mechanisms for better extraction accuracy.
Challenge: Performance degradation with very large datasets
Solution: Optimized data processing with chunk-based operations and efficient Pandas operations for scalability.
Explore Related Projects
Interested in more financial analytics and data cleaning projects? Check out these related dashboards: