Next generation of networked cyber-physical  systems will support a number of application domains e.g. connected autonomous vehicular networks, collaborative robotics in smart factories, and many other mission-critical applications. With the advent of massive machine-to-machine communication and IoT networks, huge volumes of data can be collected and processed with low latency through edge computing facilities.  Distributed machine learning  enables  cross-device  collaborative learning without exchanging raw data, ensuring privacy and reducing communication cost. Learning over wireless networks poses significant challenges due to limited communication bandwidth and channel variability, limited computational resources at the IoT devices, the heterogeneous nature of distributed data, and also randomly time-varying network topologies. In this talk, we will present  (i) low-complexity communication efficient Federated Learning (FL) algorithms based on approximate Newton-type optimization techniques employed at the local agents, which achieve superlinear convergence rate as opposed to linear rates achieved by state-of-the-art gradient descent based algorithms, and (ii) fully distributed network Newton type algorithms based on a distributed version of the well-known GIANT algorithm. While consensus based distributed optimization algorithms are naturally limited to linear convergence rates, we will show that one can design finite-time consensus based distributed network-Newton type algorithms that can achieve superlinear convergence, albeit at the cost of increased numbers of consensus rounds. We will conclude with some new directions on results on zeroth order techniques that can also achieve superlinear convergence rates in Federated Learning.