STREAMING SPEECH TO TEXT ON ANDROID A SOCKET.IO BASED SERVER APPROACH FOR ANDROID MOBILE APPLICATION

Douglas Rakasiwi Nugroho; Christopher Limawan; Kelvin

doi:10.58432/mxqehx55

Authors

Douglas Rakasiwi Nugroho Universitas Bina Nusantara Author
Christopher Limawan Universitas Bina Nusantara Author
Kelvin Universitas Bina Nusantara Author

DOI:

https://doi.org/10.58432/mxqehx55

Keywords:

Real-time ASR, Socket.IO, Server-side processing, Streaming Speech-to-Text, Android applications

Abstract

This paper details a robust system enabling real-time Speech-to-Text capabilities on Android devices, leveraging a Socket.IO-based server architecture to manage audio streams and integrate with advanced language models. This approach effectively addresses the inherent challenges of on-device processing, such as latency, power consumption, and computational overhead, by offloading the intensive Speech-to-Text and Natural Language Processing tasks to a scalable server infrastructure. This distributed processing paradigm ensures minimal resource drain on the client device while maximizing accuracy and responsiveness.

Downloads

Download data is not yet available.

References

Alsayadi, H. A., Abdelhamid, A. A., Hegazy, I., & Fayed, Z. T. (2021). Arabic speech recognition using end-to-end deep learning. IET Signal Processing, 15(8), 521–534. https://doi.org/10.1049/sil2.12057

Ansari, Z., Pourhoseini, F., & Hadaeghi, F. (2022). Heterogeneous Reservoir Computing Models for Persian Speech Recognition. 2022 International Joint Conference on Neural Networks (IJCNN), 1–7. https://doi.org/10.1109/IJCNN55064.2022.9892570

Bao, C., Huo, C., Chen, Q., & Gao, C. (2025). AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition. ArXiv Preprint ArXiv:2506.06566. https://doi.org/10.48550/arXiv.2506.06566

Benazir, A., Xu, Z., & Lin, F. X. (2024). Speech Understanding on Tiny Devices with A Learning Cache. Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services, 425–437. https://doi.org/10.1145/3643832.3661886

Chakravarty, A. (2024). Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment. ArXiv, abs/2405.0. https://doi.org/10.48550/arXiv.2405.01004

Chen, Y., Zhao, J., & Han, H. (2025). A survey on collaborative mechanisms between large and small language models. ArXiv Preprint ArXiv:2505.07460. https://doi.org/10.48550/arXiv.2505.07460

Dutta, S., Chandupatla, S., & Hansen, J. (2025). Adapting Whisper for Lightweight and Efficient Automatic Speech Recognition of Children for On-device Edge Applications. https://doi.org/10.48550/arXiv.2507.14451

Feng, C., Lin, Y., Zhuo, S., Su, C., Ramakrishnan, R. K., Yuan, Z., & Zhang, X. (2025). Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models. ArXiv Preprint ArXiv:2507.07877. https://doi.org/10.48550/arXiv.2507.07877

Georgescu, A.-L., Pappalardo, A., Cucu, H., & Blott, M. (2021). Performance vs. hardware requirements in state-of-the-art automatic speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 28. https://doi.org/10.1186/s13636-021-00217-4

Ghangam, S., Whitenack, D., & Nemecek, J. (2021). Dyn-asr: Compact, multilingual speech recognition via spoken language and accent identification. ArXiv Preprint ArXiv:2108.02034. https://doi.org/10.1109/WF-IoT51360.2021.9594961

Joshi, P., Hasanuzzaman, M., Thapa, C., Afli, H., & Scully, T. (2023). Enabling all in-edge deep learning: A literature review. IEEE Access, 11, 3431–3460. https://doi.org/10.48550/arXiv.2204.03326

Kheddar, H., Hemis, M., & Himeur, Y. (2024). Automatic speech recognition using advanced deep learning approaches: A survey. Information Fusion, 109, 102422. https://doi.org/10.1016/j.inffus.2024.102422

Nethil, K., Mishra, V., Anandan, K., & Manohar, K. (2025). Scalable Offline ASR for Command-Style Dictation in Courtrooms. ArXiv Preprint ArXiv:2507.01021. https://doi.org/doi.org/10.48550/arXiv.2507.01021

Ning, J., Zheng, C., & Yang, T. (2025). DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding. ArXiv Preprint ArXiv:2507.12000. https://doi.org/10.48550/arXiv.2507.12000

O’Shaughnessy, D. (2024). Trends and developments in automatic speech recognition research. Computer Speech & Language, 2(1), 15–30. https://doi.org/10.1016/j.csl.2023.101538

Sainath, T. N., He, Y., Li, B., Narayanan, A., Pang, R., Bruguier, A., Chang, S., Li, W., Alvarez, R., & Chen, Z. (2020). A streaming on-device end-to-end model surpassing server-side conventional model quality and latency. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6059–6063. https://doi.org/10.48550/arXiv.2003.12710

Sarkar, S., Babar, M. F., Hassan, M. M., Hasan, M., & Karmaker Santu, S. K. (2024). Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform? Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 211–222. https://doi.org/10.48550/arXiv.2304.11520

Wang, R., & Lin, F. (2023). Efficient Deep Speech Understanding at the Edge. https://doi.org/10.48550/arXiv.2311.17065

Xu, M., Jin, A., Wang, S., Su, M., Ng, T., Mason, H., Han, S., Lei, Z., Deng, Y., Huang, Z., & Krishnamoorthy, M. (2024). Conformer-Based Speech Recognition On Extreme Edge-Computing Devices. In Y. Yang, A. Davani, A. Sil, & A. Kumar (Eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track) (pp. 131–139). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-industry.12

STREAMING SPEECH TO TEXT ON ANDROID A SOCKET.IO BASED SERVER APPROACH FOR ANDROID MOBILE APPLICATION

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Entrance

Sidebar

Focus and Scope

Editorial Team

Reviewer

Peer Reviewer Process

Publication Ethics

Author Guidelines

Copyright Notice

Open Access Policy

Plagiarism Check

References Management

Article Processing Charges

Contact

Latest publications

Information

Username
Password
Remember me