With the scale of information growing every day, the key challenges in machine learning include the high-dimensionality and sheer volume of feature vectors that may consist of real and categorical data, as well as the speed and the typically streaming format of data acquisition that may also entail outliers and misses. The latter may be present, either unintentionally or intentionally, in order to cope with scalability, privacy, and adversarial behavior. These challenges provide ample opportunities for algorithmic and analytical innovations in online and nonlinear subspace learning approaches. Among the available nonlinear learning tools, those based on kernels have merits that are well documented. However, most rely on a preselected kernel, whose prudent choice presumes task-specific prior information that is generally not available. It is also known that kernel-based methods do not scale well with the size or dimensionality of the data at hand. Besides data science, the urgent need for scalable tools is a core issue also in network science that has recently emerged as a means of collectively understanding the behavior of complex interconnected entities. The rich spectrum of application domains comprises communication, social, financial, gene-regulatory, brain, and power networks, to name a few. Prominent tasks in all network science applications are those of topology identification and inference of nodal processes evolving over graphs. Most contemporary graph-driven inference approaches rely on linear and static models that are simple and tractable, but also presume that the nodal processes are directly observable. To cope with these challenges, the present thesis first introduces a novel online categorical subspace learning approach to track the latent structure of categorical data `on the fly.' Leveraging the random feature approximation, it then develops an adaptive online multi-kernel learning approach (termed AdaRaker), which accounts not only for data-driven learning of the kernel combination, but also for the unknown dynamics. Performance analysis is provided in terms of both static and dynamic regrets to quantify the novel learning function approximation. In addition, the thesis introduces a kernel-based topology identification approach that can even account for nonlinear dependencies among nodes and across time. To cope with nodal processes that may not be directly observable in certain applications, tensor-based algorithms that leverage piecewise stationary statistics of nodal processes are developed, and pertinent identifiability conditions are established. To facilitate real-time operation and inference of time-varying networks, an adaptive tensor decomposition based scheme is put forth to track the topologies of time-varying networks. Last but not least, the present thesis offers a unifying framework to deal with various learning tasks over possibly dynamic networks. These tasks include dimensionality reduction, classification, and clustering. Tests on both synthetic and real datasets from the aforementioned application domains are carried out to showcase the effectiveness of the novel algorithms throughout.