Urban air pollution need to be measured on streetwise and not the area -wise.The World Health Organisation reports that 99% of people worldwide breathe air that is beyond WHO standards, and that air pollution results in more than 7 million preventable deaths each year (WHO, 2023). In existing, area-wise pollutant/AQI is measured and not the streetwise AQI. By monitoring the leaf deposition on trees that are directly next to each street, a streetwise technique can identify micro-scale variations in deposition that are highly correlated with pedestrian exposure and local traffic volume..To solve the above problem,Phyllo-ViT method is proposed for streetwise AQI measurement. Phyllo-ViT is a multi-modal leaf sensing system that uses ViT to extract morphological patterns from photos, FTIR spectra to offer chemical fingerprints of deposited VOCs and inorganic species, and leaf images to provide visual indications of particulate deposition. A more reliable predictor of local AQI is produced when these modalities are combined than when they are used separately. The Vision Transformer (ViT) extracts, the pollutant from leaf image. Phyllo-ViT provides a scalable and non-invasive substitute for existing sensor-based system for predicting the Air Quality Index (AQI). The combination of artificial intelligence, spectroscopy, and plant physiology are used for measuring the pollutant on streetwise in real time.The results of phyllo-ViT model is compared with the standard metrics. The proposed Phyllo-ViT model achieves the highest accuracy in street residential low traffic areas R2 of about 0.96 and the lowest accuracy in high traffic areas with an R2 of about 0.93.