Front row (l to r) Tom Bartol, Markus Covert, John Glass, Drew Berry, Art Olson, Sai Li. Back row: Beata Mierzwa, Janet Iwassa, Ludovic Autin, Anders Ynnerman, Peter Mindek, Rommie Amaro, Bridget Carragher, David Goodsell, Alexander Rose, David Senhal, Graham Johnson, Vishal Sivasankar, Ivan Viola, Deng Luo, Donggang Jia. Not shown in photo: Danielle Grotjahn and Tim Davison.
In January of 2024 a targeted conference “CellVis2” was held at Scripps Research in La Jolla California, the second in a series designed to explore the promise, practices, roadblocks and prospects of creating, sharing and communicating a physical representation of entire biological cells at scales down to the atom. For this purpose a group of 25 attendees gathered for three days representing various perspectives in their own quest to address this important, but daunting goal. They comprised experimentalists, computational scientists, computer scientists, and scientific artists and communicators, many of whom individually spanned fields. They also represented a diversity of participants from laboratory heads to early stage investigators as well as 2 graduate students.
The meeting was formatted to be as interactive as possible. Limited to 25 participants, all were given equal opportunity for input and discussion. Our first session was a series of 5 minute
enlightening “lightning talks” in which each participant summarized what they brought
to and expected from the meeting. The bulk of the meeting consisted of
five panel discussions, with chosen panelists leading the discussions on the
challenges we face:
1- Data Gathering, Integration, Sharing and Retrieval - discussing challenges and future directions related to data formats, standards, integration for whole cell modeling.
2- Imaging – from volumetric to segmented and annotated 3d data discussing imaging
techniques from molecular to tissue scale, segmentation, reconstruction, data management.
3- Modeling – from molecular to ultrastructure discussing different computational approaches for spatio-temporal modeling molecules, pathways, dynamics, organelles, whole cells.
4- Visualization - multiscale, multimodal, multidata, dynamic,
discussing techniques for multiscale, multidata visualization and interaction for analysis of very large spatio-temporal models and data.
5- User interface and communication – from technology to human use and comprehensibility for scientists, educators, and the general public.
In addition to the panels and general discussions, there was time available to experience works created by conference participants displayed on virtual and augmented reality (VR/AR) headsets, including an AR ChatGTP guided exploration of a T4 Phage from Viola’s team; an AR experience in active learning about virus quasi-symmetry from Autin and Olson; and a 30 minute 3D dome presentation “The Chemistry of Life” produced by Anders Ynnerman and Drew Berry.
Associated with the Conference was a public event entitled “Art Meets Science: Visualizing the Hidden Beauty and Complexity Inside the Living Cell,” combination of an Art Show (curated by CellVis2 participant Beata Mierzwa ) and four short presentations by CellVis2 participants (David Goodsell, Peter Mindek, Janet Iwasa, and Graham Johnson).
The process of building whole cell models is massively integrative, requiring data from multiple disciplines. A selection of data types and sources are included in Table 1. Five panelists presented brief overviews of their research in whole cell modeling and their approaches to data integration challenges.
Bridget Carragher introduced the Chan Zuckerberg Imaging Institute's mission, which is focused on enabling deep insights into the architecture of complex biological systems through the development of technologies. These technologies facilitate imaging of molecular structures within cells to near-atomic resolution using cryo-electron tomography (cryoET). Despite the ease of accessing data for individual cellular components, the challenge lies in annotating the overall molecular architecture due to crowded environments and poor contrast in cryoET images. Carragher emphasized the necessity for the machine learning community's expertise to enhance cryoET annotation and segmentation algorithms.
Marcus Covert discussed the essentials for modelers to make qualitative predictions in whole cell modeling. He detailed two research initiatives: one on gene expression simulation in E. coli growth and another on tRNA aminoacylation. These studies underscore the evolving understanding of biological processes and the importance of updating mathematical models to reflect new knowledge for improved simulation outcomes.
John Glass shared insights from the minimal cell project, specifically JCVI-syn3.0, which aims to define the minimal set of genes necessary for cellular life. This project's streamlined nature facilitated the development of whole cell simulation models by Zan Luthey-Schulten's group, enabling comparisons with experimental data. Glass highlighted the project's implications for cancer treatment through targeted gene expression strategies.
David Goodsell shared his unique journey of visually capturing the intricate molecular world of cells, a project he embarked on in the 1990s with his first depiction of an E. coli interior. Every decade, Goodsell updates his illustrations to reflect the expanding knowledge base and advancements in data acquisition techniques. This thirty-year endeavor underscores not just the vast amount of information gleaned but also the improved accessibility to proteomics and structural data. However, the abundance of data introduces its own set of challenges, notably the inconsistencies between proteomic datasets and the lack of detailed information on the homo- and hetero-oligomeric states of proteins. These issues highlight the critical role of data annotation in bridging the gap between raw data and meaningful insights into molecular structures and functions.
Ludovic Autin addressed the technical challenges in constructing 3D models of biological complexes, emphasizing the time-intensive nature of data gathering and the issues surrounding large dataset sharing. He discussed the variety of current formats used for 3D whole cell modeling, highlighting the debate over maintaining a balance between including extensive metadata and linking essential information through unique IDs. Autin advocated for a unified format that could support the diverse needs of biological modeling.
Discussion after the talks covered the following issues.
Whole cell modeling faces the same challenges as other data-driven disciplines, as embodied in FAIR principles. First, data must be Findable and Accessible. To address this, a multiplicity of public databases have emerged covering most data types. In most mature cases, these databases are expertly curated, and include effective methods for search and retrieval of salient information.
The principles of Interoperability and Reusability, however, still pose the greatest challenges. This is particularly true for whole cell modeling, where data from multiple disciplines is required. Often these are provided with domain-specific formats and metadata which must be parsed and translated during the integration process. Unfortunately, adoption of standardized formats can often have a chilling effect on research, particularly in new disciplines, so the problem of reading, parsing, and integrating multiple data formats will be expected to be an ongoing challenge. We are at a point in this endeavor where community building will be essential to foster the language that we will need to communicate between different disciplines.
Provenance and confidence levels for diverse data types remain a central unresolved issue in integrative modeling. In an ideal world, these would be propagated through the integrative modeling process, informing the confidence in properties of the final model.
Simulation and prediction is playing an increasingly important role. Most structural models of cells are constructed primarily of homology models or computed structure models. Microscopy typically locates membranes and large assemblies, and positions for all other molecules are inferred from simulation. Abundances in a structural model may be taken directly from time points in a whole cell metabolic network simulation.
We are also standing on the cusp of a revolution in cellular imaging. CryoET resolution is steadily improving, resolving ever finer details. In most cases, assemblies like ribosomes, membranes and filaments are unambiguously resolved. In the best cases, diverse proteins may be identified using templated or template-free methods. To assist with development of these segmentation methods, whole cell models are being used to create simulated tomograms, providing a ground truth model for evaluation of new methods.
AI is also already revolutionizing the data and imaging aspects of whole-cell modeling. It is proving to be essential for “filling the holes” in the required data. AlphaFold2 is the most obvious example, providing models for entire proteomes and filling the gaps between available experimental structures.
Panel 2 was focused on volumetric to segmented and annotated 3d data with panelists discussing imaging techniques from molecular to tissue scale, segmentation, reconstruction and data management. This panel topic is crucial for whole cell modeling since it provides the experimental data evidence of different parts of the cell. The three talks highlighted the different scales and approaches used by the imaging communities and also how to simulate them.
Danielle Grotjean presented her overall lab’s approach to cryo-tomography to study the structure and function of mitochondria in situ. The focus is on understanding how mitochondria remodel their shape dynamically to perform diverse functions such as responding to cellular stress. She detailed the importance and challenges of membrane segmentation which highly benefited from deep learning (MemBrain-Seg) and which gave her the opportunity to study in more detail curvature and topology of the mitochondrial membrane. She presented her work on surface reconstruction using voxel-to-mesh approach which enables accurate modeling of membranes for quantitative analysis and her planned future developments. She highlighted the lack of an easy method to identify specific molecular species in the tomogram due to both lack of resolution with a bad N/S ratio and available software.
Sai Li presented his work on the SARS-Cov-2 virus and the effect of IgG incubation using cryo-tomography. He also detailed his work on Adenovirus serotype 5 vector vaccine. He talked about the history of cryoET technology from 2013-2023 to highlight the resolution revolution which combines advances in microscope technology and increasingly capable software.
Ludovic Autin detailed how he uses cell-scale molecular models to generate phantom data to reconcile model and experiment. He presented three different cases : fluorescence, soft-xray and cryo-em tomography. He highlighted that key points were collaboration, better quantitative analysis tools and the need for more available tools to generate TEM phantoms from molecular atomistic models and not simple geometric ghosts. He mentioned the potential for deep learning and training.
After the panel talks, an open discussion highlighted and expanded on what was presented. Panelists discussed advancements and challenges in the field of cryo-electron microscopy (cryo-EM) and tomography. They reflected on the rapid progress from basic structural identification to analyzing intricate molecular interactions, while acknowledging current limitations like low contrast and algorithmic challenges in tomography. The integration and potential of AI in enhancing cryo-EM techniques was a key focus, especially in terms of accurately annotating and identifying structures at the molecular and atomic level. There was a debate on whether to develop in-house machine learning solutions or wait for commercial advancements. The discussion also drew parallels to other scientific fields, highlighting the importance of massive data accumulation and the use of simulated data for AI training. The concept of creating indistinguishable simulated cryo-EM data, akin to Turing's test, was also explored, showcasing the ambitious goals and evolving nature of the field. Furthermore Some key points emerged. The importance of provenance and metadata integration, and the growing use of models (e.g. AlphaFold). The segmentation and annotation of membranes and filaments is mostly solved, but for proteins it is still a major challenge. Currently, only the ribosome can be routinely segmented and detected. It was felt that we can do better by using more annotated data for training and also with the advances in hardware and software that will be able to reduce the noise to signal ratio as well as account for the missing wedge artifacts. Noted as important in the challenge of understanding whole cell structure is bringing different fields together as each has its own terminology and perspective, highlighting the need for community building. Moreover, advances in imaging are moving towards models that merge virtual predictions with real observational data, exemplified by the use of virtual models to enhance our understanding of cellular structures. This blend of AI, curated data, and sophisticated imaging marks a new phase in research, where virtual and empirical data converge to accelerate scientific discovery.
Panel 3 focused on modeling cells from molecular to ultrastructure and spatial distribution, discussing different computational approaches for the spatio-temporal modeling of molecules, pathways, dynamics, organelles, whole cells. The goal is to understand the complexity of the whole cell structure showcasing its highly dynamic trans-scale nature. Five panelists presented talks on their research and point view covering different simulation scales from atomistic to cellular behavior and how to represent them.
Rommie Amaro presented her group’s work on whole virus all-atoms molecular dynamics simulation with a reductionist approach that focuses on high quality components such as well defined N- and O-linked glycans for surface proteins such as the hemagglutinin. She highlighted how hardware and development helped improve the completion time from 7 years for the influenza system (160M atoms) down to 7 months for the SARS-Cov-2 virion (350M atoms). She is today capable of running a simulation of up to 1 billion atoms. In the follow-up discussion she gave technical details on simulation timestep ( in the femtosecond range), storage of trajectories that take up dozens of TB and while not a bottleneck, she mentions the difficulties to visualize such a large detailed dynamic system.
Tom Bartoll gave a comprehensive description of MCell. MCell uses highly optimized Monte Carlo algorithms to track the stochastic behavior of millions of discrete molecules represented as point particles in space and time as they diffuse and interact with other discrete effector molecules (e.g. ion channels, enzymes, transporters) heterogeneously distributed within the 3-D geometry of the subcellular environment, spanning timescales in the millisecond to second range.
Vishal Sivasankar described his work on colloidal interactions and how it is connected to life-essential processes in the cell. After describing ultra-weak protein-protein interactions (UW-PPIS) and how they represent an understudied interaction regime, he presented the results of research on and how they quantified pair-wise binding events in their particle dynamics simulation. He showed that UW-PPIS can be influential enough that proteins spend most of their time in binding events. He also showed that prevalence of UW-PPIS in cells spanned from 2% up to 14% by addition of essential protein complexes such as ribosomal subunits.
Graham Johnson presented work from the Allen Cell Institute focusing on advancing cell biology understanding through open source software development and community building. He followed up on a detailed overview of what would be the next-gen virtual cells (NGVCs) which he defines as dynamic 3D models that integrate information from diverse sources including simulations, biophysical models, image-based models, and evidence based knowledge graphs. He then noted that a challenges to achieve these NGVCs will rely upon: 1) communication between experimental cell biologist and computational modelers that use different language; 2) asking the proper questions; and 3) bridging the large range of time and spatial scale involved in any biological process (ns to min, nm to mm).
Peter Mindek gave an aesthetic presentation around the question of illustrating cells and their molecular components toward outreach and comprehensive explanation. His talk explored two key points: importance and usage of abstraction depending upon the audience, and the advantage of using procedural reusable parameterizable building blocks for interactive story telling of the molecular world which he demonstrated with a dynamic model of microtubule assembly and disassembly.
The discussions and presentations from Panel 3 underscored the interdisciplinary effort required to bridge the gap between computational modeling and biological understanding at all scales, from molecules to whole cells. The panelists, through their diverse approaches and innovative research, highlighted the importance of computational power, algorithm optimization, and visualization techniques in pushing the boundaries of what can be simulated and understood about cellular and molecular dynamics. Rommie Amaro's work on molecular dynamics simulations, Tom Bartoll's insights into stochastic behavior modeling with MCell, Vishal Sivasankar's exploration of colloidal interactions, Graham Johnson's vision for next-generation virtual cells, and Peter Mindek's approach to visualization and outreach all contribute to a comprehensive understanding of cellular complexity and dynamics. These contributions, collectively, pave the way for future research that is not only more integrative and holistic but also more accessible to a broader audience. The panel vividly demonstrated that the future of cellular and molecular biology lies in the successful integration of computational models with experimental data, a collaboration that will undoubtedly accelerate discoveries and enhance our understanding of life at the most fundamental levels.
The visualization panel comprised a group of technologists and scientists who work on ways to create and represent the spatial and temporal nature of cellular landscapes and processes. The panel illustrated how Computer Scientists and Structural Biologists work together in the field of Scientific Visualization and how they addressed the overwhelming issues of data size, variation, and its multi-scale character in their depiction and interaction. The panel talks and discussions, as reported below, highlighted that, while current graphics technology enables the depiction of 10’s of billions of spheres, depicting atoms, at interactive frame rates, the major problem is representing and accessing the biological and structural information into an interpretable and interactive visual context.
Ivan Viola discussed how biological complexity can be efficiently matched with instancing capabilities of graphics hardware, such that rendering of atomistic models of entire biological systems at interactive frame rates is possible. Furthermore, the density of a biological scene allows for efficient occlusion culling utilization to speed rendering, and the scale range of biological systems allows a dynamic level-of-detail strategy enabling smooth interactive viewing and navigation. Visualizations of 3D interactive models of a simple cell such as a Mycoplasma with all its molecular components can now be deployed into web applications or immersive environments. Additionally, hardware-accelerated ray-tracing can further push interactive visualization to trillion-atom mesoscale scenes. Viola is now working on the challenge of what he terms “Conversational Visualization” where a user can talk to and interact with an AI agent to navigate, visualize, and query cellular entities.
Drew Berry focused on the process of creating his famous animations and how he is able to portray the complexity, randomness and dynamics of the cellular activity while still making it comprehensible. He illustrated what he called 'molecular choreography' of biological processes with several examples, showing how he maintains the illusion of randomness and how this approach necessitates a careful balance between accuracy and understandability. He emphasized that the animation is not a simulation, and discussed his deployment of animated loops of asynchronous instances of the molecular components to give the impression of chaotic motion and interaction happenstance. He also discussed the use of sound to enhance the perception of movement and to drive attention to the action.
Tim Davison showed demonstrations of his Virtual Reality (VR) procedural modeling and simulation application “CellBrush” which leverage texture synthesis technique to create a large number of instances of low detail models of molecular components, coupled with a physics engine to resolve collisions and reactions. He showcased how he utilized 'CellBrush' to bring one of David Goodsell’s paintings to life in 3D. He then discussed and showed his current work in developing “CellWalk”, using high resolution, cell-scale models in an interactive “story-telling” format, targeted to mobile Apple devices, such as the iPad and the just released Vision Pro. He was enthusiastic about the new technology of that device as well as new visualization technologies like Gaussian splatting and AI for capturing 3D objects and environments for web applications.
Alexandre Rose presented his work on the evolution of Mol* (Molstar) a library and toolkit to build web applications supporting servers for molecular structures, volumes and annotation efforts at the Protein Data Bank. He focused on his latest development, called Mol* Explorer, which supports the interactive visualization of mesoscale models in a web browser. He discussed the advances in visualization technologies that were used to facilitate smooth and visually interpretable images of models as large as bacteria and organelles, supporting multiple molecular rendering styles, materials, volumes, transparency and lighting properties. In order to maintain interactive rates with scenes that can contain billions of atoms he utilizes hardware instances to reduce memory requirements, level of detail and occlusion to address currently visible objects.
David Sehnhal delved more deeply into the computational components needed to enable large scale molecular visualizations from the cell down to its chemical entities by incorporating information from experimental and AI derived structure data, integrative hybrid models and simulations. He described the modular building block approach developed for Mol* to enable the computational workflow needed to handle the process from data delivery and representation through structure description and query to application and view state.
It was the consensus of the panel that current computational technologies can now enable visualization and interactions with complex biological structures of billions of atoms at interactive rates for simple cells and organelles, even on modern web browsers. Difficulties may arise in the area of data size and experimental sources for larger cells, as complexity and dynamic time-scales increase. The importance of modularity and common formats and data structure standardization will require broader community cooperation and sharing between experimentalists and visualizers. One possible way could be to extend formats such as mmCIF with procedural rules, similar to the description of larger symmetric assemblies.
The panel discussion highlighted the following exciting future directions: (I) How to leverage recent advances in 3D view synthesis technology, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) representations. (II) How conversational analysis and discovery can enhance our navigation and interpretation of large and complex molecular landscapes. Imagine having an AI agent that can guide you through the most important, interesting, or challenging aspects of a 3D model, suggesting modifications, revisits, or adjustments. (III) How to utilize style transfer through neural networks or a differentiable approach in the analysis of models and experimental images. (IV) Finally, the increasing computing capabilities of browser-based approaches, thanks to new standards such as WebGPU, open the door to developing tools for a broader audience. These discussions pave the way for innovative methodologies in scientific visualization, promising to revolutionize how we interact with and understand complex biological data.
Panel 5 explored the practice and challenges of interfacing with and communicating the nature of complex cellular environments. Panelists discussed the relationship between science and its communication within a wide variety of contexts, from creation through delivery to a targeted audience. A wide variety of media was discussed in depicting the complexity of cellular environments at multiple spatial and temporal scales to audiences that ranged from domain scientists to students of various ages and backgrounds.
David Goodsell described his process of creating watercolors painting of these environments in terms of the continuing evolution of scientific knowledge, and the limitations of the method as the systems depicted grow in size. Beyond traditional painting, he noted that for illustrating entire cells and their interactions, digital tools like Photoshop have become instrumental in generating 2D images with an extensive array of complex molecules. He then talked about universalizing his well-known aesthetic approach using an interactive tool, CellPAINT, which enables anyone to utilize his visual style to create their own cellular imagery in 2.5D.
Janet Iwasa described her approaches to animate cellular processes utilizing the same tools used in professional animation environments, but emphasizing the challenges in dealing with ever evolving scientific knowledge and the back and forth discussion between scientist and artist. She discussed approaches to annotate animations with meta-data to provide provenance and a measure of confidence in the visualization, as well as enabling on-going discussions and comments on the material at specific time-points in the animation. She illustrates a concrete application of this approach with the release of a web-based tool that combines a detailed animated molecular working model of different stages of the coronavirus life cycle with mechanisms for the research community to annotate and discuss aspects of the model.
Art Olson talked about expanding the user experience from visualization to “perceptualization.” He argued that understanding complex structural information can be enhanced by bringing in multiple modes of perception from stereoscopy to tangibility, proprioception, and immersion. He discussed the cognitive advantages of physical models in understanding biomolecular structure and showed early examples of 3-D printed models superimposed virtual structures using augmented reality (AR). He presented a recent AR application for exploring virus symmetry and quasi-symmetry that he has tested in settings from middle school upwards, emphasizing the level of presence and engagement users experienced while learning through immersive active 3D manipulation. He listed the challenges of current technology, including cost, headset acceptance and comfort, controller and user interface difficulties, as well as physiological and psychosocial issues. He was optimistic that the technology and software is evolving rapidly, and there will be wider use in the near future.
Beata Mierzwa presented her approach to communicating complex cellular processes by utilizing her artistic talents coupled to her cell biology expertise by creating metaphorical representations of the machinery of life. Her outreach work spans from bioscience inspired fashion to Mycroscopia, a game that enables students of all ages to explore the inner workings of the cell using simple mechanical metaphors for biological processes. With its cartoon-like aesthetic, it features the user as a young girl who proceeds to control the operations and compete for points. Discussions followed regarding the nature of “artistic liberty” and its use create an intuitive understanding; how to engender best practices in creating such visualizations; and how to create collaborative communities of artists and scientific visualizers.
Anders Ynnerman discussed the art of scientific visualization in the context of informal education in a public science center or museum setting. He presented work in two types of experiences, a large-scale audience immersive dome projection, and the small group exploration activity around an interactive 3D table display. He talked about his newest large dome stereoscopic production “The Chemistry of Life” produced with Drew Berry, and pointed out that audiences in a large format dome theater have expectations of “Pixar” quality animations, which requires high production values and its associated costs. He emphasized that the venue and it’s immersive quality helps in explaining, exploring, engaging and entertaining. The “frameless” format, however, requires a different approach to compositional storytelling than with traditional animation with regard to camera and object motion, cuts, localization of sound and pacing. His more interactive production on the Covid SARS2 virus and pandemic presents its own challenges and opportunities. He emphasized the goal of “bullet proofing” the exhibit, since in an interactive museum setting “everything breaks.” On the other hand the exhibit can effectively interactively guide across scales of space and time, going from the molecular structure of the virus to its impact on the body, to its impact on the planet.
A major consensus of the Panel was that the process of creating the visual communication itself is often an important part of the science of structural cell biology, and is critical in assembling data and models from different sources into a coherent picture of the phenomena that it attempts to portray and explain. Numerous examples from panelists emphasized the back and forth between the scientists and scientific visualizers from initial discussions and sketches to the final images, animations, or interactive environments. This process of producing a finished product requires an acknowledgment that typically not everything is known or even agreed upon regarding the science being portrayed. This impells both the scientist and visualizer in deciding what and how to emphasize the main message of the visualization. Several examples were discussed in which a visualization prompted scientific colleagues to challenge aspects of the portrayal which in turn prompted further research and subsequent revision. In many contexts the visualization presents the challenge of avoiding misconceptions in the target audience and how to indicate or portray unknowns and uncertainty in the visualization.
Another important point about the creation and utility of scientific visualizations focused on the multi-disciplinary nature of the process, since it involves experience in science, technology, and aesthetic communication. All of the panelists bridged two or more of these perspectives. The aspect of multi-disciplinarity raised the issue of training and cross-training of scientific visualization, as well as the need for more tightly integrated meetings across fields.
The CellVis2 conference was kept small with a focus on bringing a wide range of perspectives from a diversity of participants, both in fields of expertise and career experience. This, coupled with the panel format, giving all participants equal opportunity for discourse and discussion, enabled a strong sense of comradery and a free flow of questions and ideas. Existing collaborations were reinforced and expanded, and new ones were initiated. The experts in each of the five panels found significant commonality in scientific, technical and community challenges.
The obvious challenge of the scale and complexity of spatio-temporal data and models were discussed in all five panels, from data sharing to visual interpretation and communication. On the whole there was significant optimism regarding the acquisition of more data, as well as the computational capabilities in dealing with its size, scope, and visualization. Many of the more problematic roadblocks appeared from the nature of developing widely accepted community-wide standards and methods to enable more effective exchange, flow of data, models, and user interfaces to enable the field to grow.
Not unexpectedly there was much discussion about the promise of Deep Learning and other AI methods to expand capabilities to the next level of size and complexity. It was felt that some of the desired applications such as image segmentation and annotation are of active current interest, while others such as large data generation and integration are still mostly future prospects. As computational methods for modeling, visualization and communication become more expansive and capable, the problem of learning how to use them effectively grows, and the role of AI interfaces and tutors will become more important.
The public Art meet Science event featured a dozen artworks from local artists as well as CellVis participants and four talks that highlighted the bridge between science and art. The capacity crowd were highly engaged and feedback from the lay audience was enthusiastic.