Building an Open Data Infrastructure for Research: Turning Policy into Practice

Please download to get full document.

View again

of 39
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Similar Documents
Information Report



Views: 27 | Pages: 39

Extension: PDF | Download: 0

Building an Open Data Infrastructure for Research: Turning Policy into Practice. Juan Bicarregui Head of Data Services Division STFC Department of Scientific Computing. IDCC 2013, International Digital Curation Conference, 14-17 January 2013 , Amsterdam. Overview. The Policy Context OECD
Building an Open Data Infrastructure for Research:Turning Policy into PracticeJuan BicarreguiHead of Data Services DivisionSTFC Department of Scientific ComputingIDCC 2013, International Digital Curation Conference, 14-17 January 2013, AmsterdamOverview
  • The Policy Context
  • OECD
  • EC/NSF/…
  • G8+5
  • RCUK
  • Royal Society
  • G8
  • PaNdata
  • Photon and Neutron Open Data Infrastructure
  • The Research Data Alliance
  • Fostering Collaboration on a global scale
  • 1. The Policy Context
  • OECD, 2004-2006
  • Principles and Guidelines for Access to Research Data from Public Funding
  • EC, 2007-2012
  • Recommendation on access to and preservation of scientific information
  • G8+5, 2011-2012
  • Global Research Infrastructure Sub Group on Data
  • Research Councils UK, 2011
  • Joint Principles on Data
  • Royal Society, 2011-2012
  • Science as an Open Exercise
  • G8 Ministerial Statement, 2013
  • Grand Challenges, Global Research Infrastructures,
  • Open Scientific Research Data, Open Access
  • The views expressed herein are the personal views of the author and do not necessarily reflect the views of the policy makersEconomic ImpactThe Innovation LifecycleEnabling Wealth CreationEnabling Knowledge CreationStrategic DirectionThe Body of KnowledgeThe ResearchProcessThe GovernmentProcessImproved UnderstandingImproved Quality of LifeQuality AssessmentAggregation of Knowledge lies at the heart of the innovation lifecycleSingle Infrastructure  Single User ExperienceRaw Data CatalogueAnalysed Data CataloguePublication Data CatalogueData AnalysisDifferent Infrastructures  Different User ExperiencesPublications CatalogueRaw DataData AnalysisData AnalysisData AnalysisAnalysed DataAnalysed DataAnalysed DataPublication DataPublication DataPublication DataRaw DataRaw DataPublicationsPublicationsPublicationsSimulation 3Experiment 1Observation 2 CapacityStorageSoftware RepositoriesData RepositoriesPublications RepositoriesTechnology SharingResearch Environmentthe researcher actsthrough ingest and accessArchivalCreationAccessServicesInformation InfrastructureDatathe researcher shouldn’t have to worry about the information infrastructureNetworkStorage ComputeOpen ScienceProvenanced ResearchDataRCUK principles:Data are a Public GoodPublicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.Public good – is nonrival and non-excludable [wikipedia] consumption by one does not reduce availability for others no one can be effectively excluded from usingResearch Data recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings As few restrictions as possible Later (distinguish registration from restriction)Timely Later (discipline specific)Responsible Later (maximising access does not necessarily maximising research benefit)Intellectual Property Later (balance contribution from sharing and from primary research)RCUK Principles on Data Policy
  • Data should be managed
  • Data should be discoverable
  • There may be constraints
  • Originators may have first use
  • Reusers have responsibilities
  • Data sharing is not free
  • 3 Dimensions of policyPublic GoodTheDataitselfIntellectual PropertyRecognitionAccessConstraintsDiscoverabilityFirst UseManagementOverview
  • The Policy Context
  • OECD
  • EC/NSF/…
  • G8+5
  • RCUK
  • Royal Society
  • G8
  • PaNdata
  • Photon and Neutron Open Data Infrastructure
  • The Research Data Alliance
  • Fostering Collaboration on a global scale
  • 250mDaresbury LaboratoryESRF & ILL, GrenobleWhat is STFC?
  • Programme includes:
  • Neutron and Muon Source
  • Synchrotron Radiation Source
  • Lasers
  • Space Science
  • Particle Physics
  • Compuing and Data Management
  • Microstructures
  • Nuclear Physics
  • Radio Communications
  • Square Kilometre ArrayLarge Hadron ColliderThe PaNdata Collaboration
  • Established 2007 with 4 partners
  • Expanded since to 13organisations
  • (see next slide)
  • Aims:
  • “ construct and operate a shared data infrastructure for Neutron and Photon laboratories...”
  • PaN-data PartnersPaN-data bring together 13 major European Research InfrastructuresISIS is the world’s leading pulsed spallation neutron sourceILL operates the most intense slow neutron source in the worldPSI operates the Swiss Light Source, SLS, and Neutron Spallation Source, SINQ, and is developing the SwissFEL Free Electron LaserHZB operates the BER II research reactor the BESSY II synchrotronCEA/LLB operates neutron scattering spectrometers from the Orphée fission reactorESRF is a third generation synchrotron light source jointly funded by 19 European countriesDiamond is new 3rd generation synchrotron funded by the UK and the Wellcome Trust DESY operates two synchrotrons, Doris III and Petra III, and the FLASH free electron laserSoleil is a 2.75 GeV synchrotron radiation facility in operation since 2007ELETTRAoperates a 2-2.4 GeV synchrotron and is building the FERMI Free Electron LaserALBA is a new 3 GeV synchrotron facility due to become operational in 2010JCNSJuelich Centre for Neutron Science MaxLab, Max IV SynchrotronPaN-data is coordinated by the STFC Department of Scientific ComputingThe Science we do - Structure of materials
  • Over 30,000 user visitors each year:
  • physics, chemistry, biology, medicine,
  • energy, environmental, materials, culture
  • pharmaceuticals, petrochemicals, microelectronics
  • Visit facility on research campusPlace sample in beamDiffraction pattern from sampleFitting experimental data to model
  • Over 5.000 high impact publications per year
  • But so far no integrated data repositories
  • Lacking sustainability & traceability
  • Longitudinal strain in aircraft wingHydrogen storage for zero emission vehiclesStructure of cholesterol in crude oil Bioactive glass for bone growth Magnetic moments in electronic storagePaN-data Europe – building a sustainable data infrastructure for Neutron and Photon laboratoriesPaN-data StandardisationPaN-data Europe is undertaking 5 standardisation activities:Development of a common data policy frameworkAgreement on protocols for shared user information exchangeDefinition of standards for common scientific data formatsStrategy for the interoperation of data analysis software enabling the most appropriate software to be used independently of where the data is collectedIntegration and cross-linking of research outputs completing the lifecycle of research, linking all information underpinning publications, and supporting the long-term preservation of the research outputsStandards from PaNdataSupport ActionPaNdata ODI Service ReleasesRel 1Rel 2Rel 3Rel 4Mar 2014Jun 2013Dec 2013Sep 2013PaNdata ODI Service ActivitiesusersuCatdatadCatvLabsPaNdata ODI Joint Research Activitiess/wProvIntegPresScaleArchivalCreationAccessCurationServicesNetworkStorage ComputeThe 7 C’sCommunicationCollaborationCreationCollectionDataCapacityCurationComputationMetadata CollectionRecord PublicationProposalApproval SchedulingData cleansingSubsequent publication registered with facilityExperimentData analysisScientist submits application for beamtimeTools for processing made availableFacility committee approves applicationRaw data filtered and cleansedScientists visits, facility run’s experiment Facility registers, trains, and schedules scientist’s visitAuthentication
  • Credit: Bjorn Apt, PSI,
  • Provenance:SANS2d: Experiment coordinationISISOpenGenie ScriptSampleTracksSampleInformationData AcquisitionData ArchiveData Processingraw data(Extended) ICAT Data CatalogueDOIsBritish LibraryDOI ServerOutputsderived dataNew linksPublicationsELN
  • Credit: Brian Matthews, STFC,
  • Linking the software application into the research objectSoftware Repository:inputDataset:dataset:investigator:applicationInvestigation 1:outputDataset:relatedDataset:sample:instrumentcito:citescito:cites:publication:publication
  • Own metadata format (CSMD)
  • W3C Prov ontology
  • Assume that the software is in a repository
  • Credit: Brian Matthews, STFC,
  • Tomographic Reconstruction~100Gb per 3D image - ~40 mins on 16 GPU cluster~10 TB per experiment” - ~3 days on site~ 1PB per year (per beamline) Working on using the Emerald (376 GPUs)
  • Credit: Mark Basham, Diamond,
  • ESRF example: Amber inclusion Prioriphoraschroederhohenwarthi
  • Xray imaging of 1mm Prioriphora (scuttle fly) from Cretaceous period
  • found at Archingeay-Les Nouillers in opaque amber
  • Solorzano et al, 2011, Systematic Entomology (2011)Overview
  • The Policy Context
  • OECD
  • EC/NSF/…
  • G8+5
  • RCUK
  • Royal Society
  • G8
  • PaNdata
  • Photon and Neutron Open Data Infrastructure
  • The Research Data Alliance
  • Fostering Collaboration on a global scale
  • 3. The Research Data AllianceNew international organizationCurrently supported by: EU NSF Australian National Data ServiceTo accelerate data-driven innovation through research data sharing and exchange. Infrastructure, Policy, Practice and StandardsResearch Data AllianceVision and PurposeVisionResearchers around the world sharing and using research data without barriers.Purpose… to accelerate international data-driven innovation and discovery by facilitating research data sharing and exchange, use and re-use, standards harmonization, and discoverability. …through the development and adoption of infrastructure, policy, practice, standards, and other deliverables.RDA PrinciplesOpenness
  • Membership is open to all interested organizations,
  • all meetings are public,
  • RDA processes are transparent, and
  • all RDA products are freely available to the public;
  • Consensus
  • The RDA moves forward by achieving consensus and
  • resolves disagreements through appropriate voting mechanisms;
  • Balance
  • The RDA is organized on the principle of balanced representation for individual organizations and stakeholder communities;
  • Harmonization
  • The RDA works to achieve harmonization across
  • standards, policies, technologies, tools, and other data infrastructure elements;Voluntary
  • The RDA is not a government organization or regulatory body and, instead, is a public body responsive to its members; and
  • Non-profit
  • RDA is not a commercial organization and will not design, promote, endorse, or sell commercial products, technologies, or services.  
  • “Building Bridges”
  • Bridges to the future
  • data preservation
  • Bridges to research partners
  • Bridges across disciplines
  • Bridges across regions
  • Bridges to integration
  • to solve new problems
  • Bridges across communities
  • RDA roleTwo bridges we can build:
  • Connecting Data
  • Connecting People
  • What kind of organisation do we need to do this?Council(Strategy)Technical Advisory Board(Workplan)Secretary General(Operating Plan)Organisational Advisory Board(Procedures)RDA BodiesIndividual MembershipSecretariatOrganisational MembershipTask GroupsMembers of StaffOrganisationsAdministrative DomainProcedural DomainTechnical DomainData Practitioners DomainAdministrative DomainOnline Open Interaction Fora- use for all kinds of activities, open to all RDA members
  • Working Groups and Interest Groups
  • - Carry out work of RDA
  • - Reach consensus on outputs
  • May suggest BoFs about new topics
  • Open to all but…
  • some commitment expected
  • Admistration and Management Team
  • Implement strategic direction set by council
  • Supports the activities of the RDA
  • Arrange plenary meetings
  • Run the on-line for a
  • Manage documents
  • Convene nominating committees for
  • Council and TAC
  • Monitor and controls finances
  • Prepare reports for
  • Council, funders,….
  • Plenary
  • Open to all persons involved in RDA
  • Hears and comments on reports from WGs
  • Suggests new IGs and WGs
  • Hears candidates for TAC
  • Technical Advice Committee
  • - advise on WG work activities
  • - Interacting directly with working groups
  • advise on new WGs and new BoFs
  • Give implementation suggestions to strategic direction from council
  • Council- Set strategic direction
  • - Final vote on governance matters
  • Approve new WGs (TAC advised)
  • control balanced WG approach
  • Example RDA Working Groups
  • Data Citation
  • Data Foundation and Terminology
  • Data Type Registries
  • Metadata Standards
  • PID Information Types
  • Practical Policy
  • Standardisation of Data
  • Some Risks
  • Standardisation is easy, I’ve done it a hundred times
  • (apologies to Mark Twain)
  • Two easy ways to standardise:
  • The Imperial model
  • The Esperanto model
  • Justify need, define benefit, involve stakeholders
  • Make a small steps and reassess
  • “Never generalise from one example”
  • Supporting Projects
  • Three projects supporting RDA through its first phase:
  • RDA/Europe (previouslyiCordi)EC Project
  • RDA/US NSF Project
  • Support in Australia through ANDS
  • Steering Group setting it up:
  • US – Fran Berman, Beth Plale
  • EU – Leif Laaksonen, Peter Wittenburg, Juan Bicarregui
  • Australia – Ross Wilkinson, Andrew Treloar
  • TAB to be elected at 2nd Plenary
  • First Oranisational Assembly at 2nd Plenary
  • RDA Status in June 2013
  • Pre-launch meetings in Munich and Washington September 2012,
  • ~200 Delegates
  • Various Workshops eg through eIRG, IDCC, ….
  • Launch and First Plenary, March 2013, Guttenberg,
  • ~250 participants
  • Currently, 8 Working Groups and 14 Interest Groups
  • Second Plenary, September 16-18 2013, Washington
  • Third Plenary, March 26-28, 2014, Dublin
  • Fourth Plenary, TBD
  • Please get involved by registering and participating in the discussions:
  • Website:
  • RDAPolicyInitiativesDisciplinary InitiativesThe Innovation LifecycleEnabling Wealth CreationEnabling Knowledge CreationThe Body of KnowledgeThe ResearchProcessThe GovernmentProcessImproved UnderstandingImproved Quality of LifeAggregation of Knowledge lies at the heart of the innovation lifecycleOverview
  • The Policy Context
  • OECD
  • EC/NSF/…
  • G8+5
  • RCUK
  • Royal Society
  • G8
  • PaNdata
  • Photon and Neutron Open Data Infrastructure
  • The Research Data Alliance
  • Fostering Collaboration on a global scale
  • Thank End
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!