This paper addresses the critical gap in geospatial reasoning capabilities of Large Language Models (LLMs). Despite their impressive performance in natural language understanding and generation, LLMs struggle with geospatial reasoning, limiting their potential in applications requiring understanding of and interaction with the physical environment. We propose MapLLM, a comprehensive approach to enhancing geospatial capabilities in LLMs through three key strategies: resource adaptation, knowledge incorporation, and customization. Our methodology includes developing a benchmark framework for evaluating geospatial understanding, enhancing text-only LLMs through fine-tuning on geospatially rich datasets, and designing a multimodal LLM architecture that can jointly reason over textual and geospatial inputs. We introduce a novel geospatial encoder and connector architecture inspired by vision-language models to effectively integrate geospatial information with LLMs. By leveraging OpenStreetMap data and other geospatial resources, we create datasets that capture spatial relationships, geographic contexts, and hierarchical spatial structures. The potential applications of this research extend to autonomous navigation, intelligent transportation systems, urban planning, and context-aware assistants. Our work demonstrates that improved geospatial reasoning capabilities in LLMs can significantly enhance their utility in real-world scenarios where contextual information such as location, time, and user preferences are critical for providing relevant responses.