STREAMING SPEECH TO TEXT ON ANDROID A SOCKET.IO BASED SERVER APPROACH FOR ANDROID MOBILE APPLICATION

Authors

  • Douglas Rakasiwi Nugroho Universitas Bina Nusantara Author
  • Christopher Limawan Universitas Bina Nusantara Author
  • Kelvin Universitas Bina Nusantara Author

DOI:

https://doi.org/10.58432/mxqehx55

Keywords:

Real-time ASR, Socket.IO, Server-side processing, Streaming Speech-to-Text, Android applications

Abstract

This paper details a robust system enabling real-time Speech-to-Text capabilities on Android devices, leveraging a Socket.IO-based server architecture to manage audio streams and integrate with advanced language models. This approach effectively addresses the inherent challenges of on-device processing, such as latency, power consumption, and computational overhead, by offloading the intensive Speech-to-Text and Natural Language Processing tasks to a scalable server infrastructure. This distributed processing paradigm ensures minimal resource drain on the client device while maximizing accuracy and responsiveness.

Downloads

Download data is not yet available.

References

Alsayadi, H. A., Abdelhamid, A. A., Hegazy, I., & Fayed, Z. T. (2021). Arabic speech recognition using end-to-end deep learning. IET Signal Processing, 15(8), 521–534. https://doi.org/10.1049/sil2.12057

Ansari, Z., Pourhoseini, F., & Hadaeghi, F. (2022). Heterogeneous Reservoir Computing Models for Persian Speech Recognition. 2022 International Joint Conference on Neural Networks (IJCNN), 1–7. https://doi.org/10.1109/IJCNN55064.2022.9892570

Bao, C., Huo, C., Chen, Q., & Gao, C. (2025). AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition. ArXiv Preprint ArXiv:2506.06566. https://doi.org/10.48550/arXiv.2506.06566

Benazir, A., Xu, Z., & Lin, F. X. (2024). Speech Understanding on Tiny Devices with A Learning Cache. Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services, 425–437. https://doi.org/10.1145/3643832.3661886

Chakravarty, A. (2024). Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment. ArXiv, abs/2405.0. https://doi.org/10.48550/arXiv.2405.01004

Chen, Y., Zhao, J., & Han, H. (2025). A survey on collaborative mechanisms between large and small language models. ArXiv Preprint ArXiv:2505.07460. https://doi.org/10.48550/arXiv.2505.07460

Dutta, S., Chandupatla, S., & Hansen, J. (2025). Adapting Whisper for Lightweight and Efficient Automatic Speech Recognition of Children for On-device Edge Applications. https://doi.org/10.48550/arXiv.2507.14451

Feng, C., Lin, Y., Zhuo, S., Su, C., Ramakrishnan, R. K., Yuan, Z., & Zhang, X. (2025). Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models. ArXiv Preprint ArXiv:2507.07877. https://doi.org/10.48550/arXiv.2507.07877

Georgescu, A.-L., Pappalardo, A., Cucu, H., & Blott, M. (2021). Performance vs. hardware requirements in state-of-the-art automatic speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 28. https://doi.org/10.1186/s13636-021-00217-4

Ghangam, S., Whitenack, D., & Nemecek, J. (2021). Dyn-asr: Compact, multilingual speech recognition via spoken language and accent identification. ArXiv Preprint ArXiv:2108.02034. https://doi.org/10.1109/WF-IoT51360.2021.9594961

Joshi, P., Hasanuzzaman, M., Thapa, C., Afli, H., & Scully, T. (2023). Enabling all in-edge deep learning: A literature review. IEEE Access, 11, 3431–3460. https://doi.org/10.48550/arXiv.2204.03326

Kheddar, H., Hemis, M., & Himeur, Y. (2024). Automatic speech recognition using advanced deep learning approaches: A survey. Information Fusion, 109, 102422. https://doi.org/10.1016/j.inffus.2024.102422

Nethil, K., Mishra, V., Anandan, K., & Manohar, K. (2025). Scalable Offline ASR for Command-Style Dictation in Courtrooms. ArXiv Preprint ArXiv:2507.01021. https://doi.org/doi.org/10.48550/arXiv.2507.01021

Ning, J., Zheng, C., & Yang, T. (2025). DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding. ArXiv Preprint ArXiv:2507.12000. https://doi.org/10.48550/arXiv.2507.12000

O’Shaughnessy, D. (2024). Trends and developments in automatic speech recognition research. Computer Speech & Language, 2(1), 15–30. https://doi.org/10.1016/j.csl.2023.101538

Sainath, T. N., He, Y., Li, B., Narayanan, A., Pang, R., Bruguier, A., Chang, S., Li, W., Alvarez, R., & Chen, Z. (2020). A streaming on-device end-to-end model surpassing server-side conventional model quality and latency. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6059–6063. https://doi.org/10.48550/arXiv.2003.12710

Sarkar, S., Babar, M. F., Hassan, M. M., Hasan, M., & Karmaker Santu, S. K. (2024). Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform? Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 211–222. https://doi.org/10.48550/arXiv.2304.11520

Wang, R., & Lin, F. (2023). Efficient Deep Speech Understanding at the Edge. https://doi.org/10.48550/arXiv.2311.17065

Xu, M., Jin, A., Wang, S., Su, M., Ng, T., Mason, H., Han, S., Lei, Z., Deng, Y., Huang, Z., & Krishnamoorthy, M. (2024). Conformer-Based Speech Recognition On Extreme Edge-Computing Devices. In Y. Yang, A. Davani, A. Sil, & A. Kumar (Eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) (pp. 131–139). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-industry.12

Downloads

Published

21-10-2025

How to Cite

Nugroho, D. R., Limawan, C., & Kelvin. (2025). STREAMING SPEECH TO TEXT ON ANDROID A SOCKET.IO BASED SERVER APPROACH FOR ANDROID MOBILE APPLICATION. Algebra : Jurnal Pendidikan, Sosial Dan Sains, 5(4), 838-843. https://doi.org/10.58432/mxqehx55