In model-based testing of safety-critical systems, structural coverage criteria have been widely used to measure test suite adequacy as well as a target when generating tests. We have found that the fault-finding effectiveness of tests satisfying structural coverage criteria is highly dependent on program structure; and even if the faulty code is exercised, its effect may not be observable at the output. To address these problems, we define observability as a desirable attribute of testing to mandate that the effect of exercising a structural part must manifest itself at a subsequent observable point in the program. We further propose an incremental test generation approach that combines the notion of observability and dynamic symbolic execution. Our results show that the notion of observability together with the incremental test generation approach are effective at detecting faults, robust to program restructuring, and efficient in generating tests. On the other hand, advances in automated test generation from system models do not always translate to realizable benefits in terms of testing an implementation of the system, because platform-specific details are often abstracted away to make the models amenable to various analyses. Testing an implementation to expose non-conformance to such a model requires reconciling differences arising from these abstractions. Previously proposed approaches address this by being reactively permissive: passing criteria are relaxed to reduce false positives, but may increase false negatives, which is particularly bothersome for safety-critical systems. To address this concern, we propose an automated approach that is proactively adaptive: test stimuli and system responses are suitably modified taking into account platform-specific aspects so that the modified test -- when executed on the platform-specific implementation -- exercises the intended scenario captured in the original model-based test. We show that our framework eliminates false negatives while keeping the number of false positives low for a variety of platform-specific implementations.