照例 【】里面的是一些碎碎念
Consumer data
“Data concerning consumers, where such data have been collected, traded or used as part of a commercial relationship” (OECD, 2020)
Data access/accessibility
通常和open government data放在一起讲,也就自然的和transparency usability, discoverability, privacy 一起被提及。而之所以讲access 正是因为在使用数据获得价值时收到了阻碍想要把它remove掉,才有为什么要open的想法。而open data 在oecd里被提及时是作为一种approach
Data accessibility measures the extent to which government data are provided in open and re-usable formats, with their associated metadata. Core features of accessible data include providing them free of charge, with unrestricted access, and in machine-readable formats. (oecd, 93c6d805-en)
A large number of countries have created local or national government data portals in order to provide access to open government datasets. [………] Global open data index tracks whether published data is actually released in a way which is accessible to all stakeholders, and measures the openness level of data globally. 【……..】Publishing data and making it accessible qualifies as ‘open data’. 【……..】As identified by the authors of (Ochoa & Duval, 2006), the accessibility quality dimension has two measures: (1) The cognitive accessibility defines how easy it is for a data consumer to understand the published information; (2) The second measure is the psychological or logical accessibility, which can be defined as the ease with which the relevant dataset is discovered through a data catalogue or repository. (https://www.sciencedirect.com/science/article/pii/S0740624X1500091X#bb0255)
Data accessibility determines the extent to which data consumers in an organization can access and utilize data to achieve the organizational goals, increase productivity and efficiency without requiring advanced know-how and experience in working with data. (https://thinkinsights.net/data/data-accessibility/)
In promoting data access and sharing policy makers will thus need to take account of: i) the sensitivity of the data and the degree by which personal data could be re-identified; ii) the overlapping rights and interests of all relevant stakeholders; and iii) the manner by which data are generated, in order to better take into account the contributions of the various stakeholders in the creation of that data. (https://www.oecd-ilibrary.org/sites/276aaca8-en/index.html?itemId=/content/publication/276aaca8-en)
Data accountability
An individual or organization is accountable for ‘‘open data’’ when they are answerable
for the act(s) of making data open, whatever those acts might be. (https://journals.sagepub.com/doi/pdf/10.1177/2053951717718853)
关于本身accountability的定义. It is often interpreted as a form of stewardship and/or responsibility involving account giving. ‘Why’ and ‘for whom’ we give an account can have significant implications for the ‘way we account’. [所以经常会听到很多accountability-based performance measurement system, 而在PPP中也自然非常重要] Accountability is often associated with the execution of responsibilities and being answerable for them. [所以我记得在某一个画accountability和transparency的axis中,answerable是一个很重要的指标]………..作者接着提到 Dubnick and Romzek对于它的一个说法是,management of expectations/perceptions of different stakeholders. (https://link.springer.com/article/10.1007/s10997-009-9109-6)
Accountability made its formal debut in the field of international data protection more than 30 years ago, when it was adopted as a data protection principle in OECD. 【有意思 原来是和数据保护联系在一起的 比如个人数据处理总是会附带着数据控制者对于数据subject的power的提升,而此时数据保护条例就会被制定出来以保护个人的数据被数据控制者权力滥用 而这些保护是与data accountability的提升有很大的联系 所以经常我们会在比如OECD guideline里看到accountability 作为数据保护principles 】“While accountability is a notoriously difficult concept to define, most contemporary definitions include two key elements: the conferring of responsibility and authority, and the answering for the use of that authority.” (from book Managing Privacy through Accountability, p. 50)”
Data collaboratives
Six types of data collaboratives: data cooperation or pooling, prizes & challenges, research partnerships, intelligent products, application programing interactions, trusted intermediaries. Five ways data collaboratives create public value: situational awareness and response, public service design and delivery, knowledge creation and transfer, prediction and forecasting, impact assessment and evalution (stefaan Verhust, slideshare, better data for better policy)
Data compatibility
(data-conversion.org) Data Compatibility, in IT terms, means that data is integrated data throughout an organization, among organizations, and across industries.
Data democratization/democratizing data
data democracy 会和imbalanced data/data imbalance when making decisions放在一起被讲 【欸 这怎么好像是democracy了 那和democratization有何不一样】
Data Democratization is the act of opening organizational data to as many employees as possible, given reasonable limitations on legal confidentiality and security. Data Democratization allows data to transition from the hands of a select few employees into the hands of the masses within a firm. [……….] Data Democratization derives its core principles from related movements like Data Philanthropy and Open Data, in that it aims to increase data access and reduce gatekeepers for data stores. 按照作者的定义,即 数据能够被non-technical以及non-specialist employee 使用【这里个人的小疑惑是 为什么只说是employee 可看到作者后来给的literature 讲的是employee empowerment, 讲的是为公司服务 讲的是公司/机构里的数据文化;作者后来又用resource based view 和 resource dependent theory 来解释;又给了几个例子,比如airbnb的data university】(https://core.ac.uk/reader/326836340)
Data democratization is the ongoing process of enabling everybody in an organization, irrespective of their technical know-how, to work with data comfortably, to feel confident talking about it, and as a result, make data-informed decisions and build customer experiences powered by data. (https://towardsdatascience.com/what-the-heck-is-data-democratization-39b86eb27aa6)
Data governance
在很多文献中用的是DAMA international, 2009的概念,也就是是“ exercise of authority and control over the management of data”
(data conversion.org) Data governance is the discipline of cataloging and defining important data, assigning ownership of data, and incorporating the administration of data into the everyday business process.
Data governance is defined as the process by which stewardship responsibilities are conceptualized and carried out, that is, the policies and approaches that enable stewardship. Data governance establishes the broad policies for access, management, and permissible uses of data; identifies the methods and procedures necessary to the stewardship process; and establishes the qualifications of those who would use the data and the conditions under which data access can be granted.(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965885/)
Data governance refers to who holds the decision rights and is held accountable for an organization’s decision-making about its data assets. Our framework for data governance includes five interrelated decision domains: Data principles (establish the direction/boundary requirement); Data quality; Metadata (how data is interpreted); Data access; and Data lifecycle. (https://dl.acm.org/doi/pdf/10.1145/1629175.1629210)
Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods. (DGI data governance institute) 又在书[data stewardship: an actionable guide to effective data management and data governance, Plotkin, 2013] 作者提到 DGI的Gwen Thomas对data governance 的解释是 ” exercise of decision-making and authority for data-related matters”, 以及作者就着这个接着说 重点是对 人们管理以及对数据做决定的行为过程 规范角色和责任,而不是说对data itself. 也就是说 data governance 是需要确保数据的质量以及能被合理地使用,meeting compliance requirements, ” helping utilize data to create public value”.
Janssen et al. 在他们的文章 Data governance: Organizing data for trustworthy Artificial Intelligence 中 define data governance 为 “organizations and their personnel defining, applying and monitoring the patterns of rules and authorities for directing the proper functioning of, and ensuring the accountability for, the entire life-cycle of data and algorithms within and across organizations. Structurally, data governance is exercised through policies, incentives and sanctions, as needed to create an organizational culture where data is treated as an asset, and behaviour that supports or violates this treatment is rewarded or sanctioned respectively. 进而包括 数据的标准化(metadata interoperability等的重要性) 数据的监测 风险的衡量 数据生命周期监测等等。
Abraham et al. 在他们的文章 Data governance: A conceptual framework, structured review, and research agenda 中定义 ”Data governance specifies a cross-functional framework for managing data as a strategic enterprise asset. In doing so, data governance specifies decision rights and accountabilities for an organization’s decision-making about its data. Furthermore, data governance formalizes data policies, standards, and procedures and monitors compliance. 首先 它是cross-functional,可以beyond boundaries以及data subject areas; 第二 它是一个框架,提供数据管理的框架和formalization;三 把数据视为一种策略性企业资产 【对于这个 asset的定义本人存疑】;四 重点突出了决策的权利以及问责 也就是 对数据要做出怎样的决定 怎么作出的 谁有权利做出这些决定;五 需要通过数据政策 标准 程序实施;最后 需要monitor compliance 这就包括了需要ensure 数据政策和标准能够被按理实施。【这篇文章提出的agenda挺有意思的,分了五个方面 (1) governance mechanisms; (2) scope of data governance; (3) antecedents of data governance; (4) consequences of data governance; and (5) generalizability and replicability of findings。有一些点:ownership and accountability for data, 关于data owner的定义,谁决定 谁定义 边界在哪里 决策机构怎么adapt 概念的更新;在数据合作或者共享中 怎样确保数据ownership and control, ethical and permissible use of big data 呢,以及数据价值及衡量的问题;等等
(TDAN) Data Governance: The execution and enforcement of authority over the management of data assets and the performance of data functions. [the data do not govern itself]. […………….] In today’s analytics-driven society, the public sector can transform this historic information to reduce operational costs and improve public service to better address the needs of a given community. Data governance is the foundation for these strategies. To unlock your data’s value, you need a data governance program that addresses: data ownership, privacy concerns, data breach mitigation measures, dataset availability and integrity, transparency over data usage.
Data intermediaries
Data intermediaries enable data holders to share their data, so it can be re-used by potential data users. They may also provide additional added-value services such as data processing services, payment and clearing services and legal services, including the provision of standard-licence schemes. There are a wide variety of types of data intermediaries. The most popular types are data repositories and data brokers. Data repositories, which sometimes are also referred to as data libraries or data archives, preserve data as a resource of knowledge for society. The core business objective of data brokers is to collect and aggregate data, including personal data (https://www.oecd-ilibrary.org/sites/276aaca8-en/1/2/2/index.html?itemId=/content/publication/276aaca8-en&csp=a1e9fa54d39998ecc1d83f19b8b0fc34&itemIGO=oecd&itemContentType=book)
Data Interoperability
Interoperability refers to the functionality of information systems to exchange data and to enable sharing of information. (https://edps.europa.eu/data-protection/our-work/subjects/interoperability_en)
Data interoperability refers to the ways in which data is formatted that allow diverse datasets to be merged or aggregated in meaningful ways. (NNLM)
(OECD report) Interoperability measures are distinct but related to data portability, in that they focus on allowing systems to communicate with one another. Depending on their design, interoperability measures can promote competition among digital
platforms, by allowing users to preserve network effects on new services, and within digital platforms, by allowing users to mix and match different complementary services from different providers. 也就是说和portability的不同是,后者是user-initiated data transfer, 而前者是 service providers的视角
Data ownership
Establishing ownership requires understanding how the data is collected and who used it, then determining who can best be responsible for the content and quality of the data elements. (https://books.google.com.hk/books?hl=zh-TW&lr=&id=ocfrDwAAQBAJ&oi=fnd&pg=PP1&dq=data+ownership+and+data+stewardship&ots=X1Jmz_F62q&sig=2tdAq88pRn7lxtpsVELXpxYb6BQ&redir_esc=y#v=onepage&q=data%20ownership%20and%20data%20stewardship&f=false)
As data are generated, then data are stored. When we speak about data ownership, we refer to the storage process. If so, then the ownership of data storage resides with the owner of the storage. Thus, we as individuals, the government as our governing agent, law enforcement agencies and the courts, security agencies, our service providers, and our network operators who enable us to move our data are all our data owners. (from Data Ownership: Who Owns ‘My Data’?, 2011)
Despite the absence of a unified view on the subject, data ownership is widely considered as the possession of complete control over the data and its rights including, but not limited to access, creation, generation, modification, analysis, use, sell, or deletion of the data, in addition to the right to grant rights over the data to others. One of the major complications is the perception of data ownership and the willingness of data holders to share their data. Data ownership represents a major concern that influences the beneficial use and share of information. (Data Ownership: A Survey, 2021)
Data philanthropy
Data philanthropy, understood as the donation of data from both individuals and private companies. (https://heinonline.org/hol-cgi-bin/get_pdf.cgi?handle=hein.journals/hastlj70§ion=68)
The recent popularity of organizational data and analytics has crossed over into corporate responsibility programs in the form of data philanthropy, sometimes known as data collaborations. It is usually initiated by corporate donors who want to leverage their data capabilities to advance social good (Singh, Citation2016). [………] Data philanthropy is different than but closely relates to corporate philanthropy. Data is a strategic VRIN resource of the firm, unlike the cash or in-kind gifts that are more common in corporate philanthropy. (https://www.tandfonline.com/doi/full/10.1080/10580530.2020.1696587)
Data portability
(OECD, 2021) Data portability has been identified as a procompetitive measure to empower consumers to choose among competing providers. 另在一篇data competition相关的报告提到,data portability measures aimed at promoting competition seek to reduce user switching costs and reduce the frictions associated with trying new services从而可以让新的市场entrants可以吸引到users并可能会降低data access进入相关的cost,同时说明了data portability和interoperability两个是数据竞争政策的核心之一。///// 又 另一个报告提到 “the ability (sometimes described as a right) of a natural or legal person to request that a data holder transfer to the person, or to a specific third party, data concerning that person in a structured, commonly used and machine-readable format on an ad-hoc or continuous basis.”
Data portability generally refers to a users’ ability to easily download or transfer their personal data from a digital platform in an organized and machine-readable format. (bipartisanpolicy.org)
In a few countries, data portability with a focus on consumer data is emerging as another policy means for promoting data access and sharing in the private sector. [……] Data portability, which provides restricted access to those involved in the creation and collection of data, such as data subjects in respect to their personal data. It promises to empower users by giving them more control over their data, but it may also expose them to new risks. [………….] Data portability is often regarded as a promising means for promoting cross-sectoral re-use of data while strengthening the control rights of individuals over their personal data and businesses (in particular small and medium-sized enterprises [SMEs]) over their business data. 简而言之,一是为了减少个人与数据提供方的信息不对称性, 第二是减少个人的switching costs 以及lower lock-in effects, 三是potentially减少barriers to market entry【这篇报告中给了一个data “openness” continuum 齐总比较常用的 contractual agreement, open data, data portability, restricted data-sharing arrangements (https://www.oecd-ilibrary.org/sites/276aaca8-en/index.html?itemId=/content/publication/276aaca8-en)
Data Sharing
2019 OECD 对37个国家超过200个policy initiatives做了研究,发现大概65%的initiatives focus在公共领域的数据,而较少的国家指定了相关的政策来促进公共领域和私人领域的数据分享,尽管此为经常被引用的一个挑战. “Data sharing” refers to the provision of data by the data holder, on a voluntary basis. It includes the re-use of data based on commercial and non-commercial conditional data-sharing agreements, as well as open data. Data sharing assumes common interests between the entities agreeing to share their data, including the interest and expectation that data holders can become data users and vice versa. Data sharing therefore can come with an expectation of some kind of reciprocity among the stakeholders engaged in data-sharing agreements. […………….] In data partnerships organisations agree to share and mutually enrich their data sets, including through cross-licensing agreements. One big advantage is the facilitation of joint production or co-operation with suppliers, customers (consumers) or even potential competitors. 【这里也提到了对于政府在PPP中的多重角色问题,比如如何balance在authority 和 service data provider之间,以及到底应该apply怎样的规则】
Data Silos
Data silos is the phenomenon where organizational units horde data assets. Such data are accessible only by the department that owns that data, but are opaque to the rest of the organization, even if that data could also benefit the other departments. (https://thinkinsights.net/data/data-accessibility/)
Data Stewardship
简而言之 就是负责日常数据管理的责任rules和机制. 在TDAN里强调的是 Data does not govern itself, data也不能explain itself. Data is shared and used by any 因此谁拥有数据 谁做决定 当数据出错了谁负责 谁又能做出决定
Data stewardship can be thought of as a collection of data management methods covering acquisition, storage, aggregation, and deidentification, and procedures for data release and use. [………..] A data steward assumes the availability of data to be stewarded. In this regard, the environment is clouded. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965885/)
Stewardship is akin to accountability. Data Stewardship: The formalization of accountability for the management of data resources. The Data Stewardship Approach to Data Governance involves getting the “right” people involved in making the “right” decisions over the management of the “right” data at the “right” time for the “right” purpose. (https://tdan.com/the-data-stewardship-approach-to-data-governance-chapter-1/5037)
A data steward is responsible for carrying out data usage and security policies as determined through enterprise data governance initiatives. (techtarget)
Data stewardship is the operational aspect of data governance – where most of the day to day work of data governance gets done. It consistes of the people, organization, and processes need to ensure that the appropriately designated stewards are responsible for the governed data (https://books.google.com.hk/books?hl=zh-TW&lr=&id=ocfrDwAAQBAJ&oi=fnd&pg=PP1&dq=data+ownership+and+data+stewardship&ots=X1Jmz_F62q&sig=2tdAq88pRn7lxtpsVELXpxYb6BQ&redir_esc=y#v=onepage&q=data%20ownership%20and%20data%20stewardship&f=false)
Data transparency
Transparency also means that stakeholders not only can access the data, but they also should be enabled to use, reuse and distribute it.
Transparency refers to the notion that information about an individual or organization’s actions can be seen from the outside. 关于它和access的关系:the transparency concept clearly implies accessibility—if something is not accessible, it cannot be transparent—but providing access does not itself make something transparent. (https://journals.sagepub.com/doi/pdf/10.1177/2053951717718853)
Data Value (Chain)
The value that derived from processing the data using different analytics that contributes to problem solving. (IGI global)
First Party/second Party/Third Party Data
(signal.co) First-party data is information a company collects directly from its customers and owns. First-party data (also known as 1P data) is part of the mosaic of data marketers have at their disposal. It can complement, enhance, and reduce the need for other types of data. 比如web mobile app behavior, purchase history, loyalty status.
Second-Party Data is first-party data from a trusted partner. This data can help a company achieve greater scale than relying on its own data alone, and because the data isn’t sold openly, it can provide greater value than third-party data, which is usually available to anyone who wants to buy it. 比如信用卡公司可能会从airline获得消费者的信息,又或者一个出版商可能会把它读者的信息和想要在它网站上做广告的广告商
Third-Party data. Unlike first-party data, third-party data usually comes not from the direct relationship between a customer and a company, but an outside source that has collected the data. Third-party data often comes from a variety of sources across the web, and this data is then aggregated, segmented, and sold to companies for their own advertising use.
Public-sector data and private-sector data
Public-sector data and private-sector data are often mistakenly used as synonyms for public (domain)11 data and private (proprietary) data, respectively: However, data produced and controlled by the public sector is usually proprietary data at first, before being put in the public (domain) thanks to open data initiatives (see subsection “Open data”). Similarly, even if most data produced and controlled by the private sector can be considered proprietary (private) data, some data in the private sector may remain in the public domain, for instance if they are open data. The distinction (private sector vs. public-sector data) does not fully reflect the data of households, most of which is personal data. 因此若以domain分,为personal domain, private domain, public domain, 而domain相互重叠中的data有可能成为proprietaty personal data, public personal data, publicly funded data (https://www.oecd-ilibrary.org/sites/276aaca8-en/1/2/2/index.html?itemId=/content/publication/276aaca8-en&csp=a1e9fa54d39998ecc1d83f19b8b0fc34&itemIGO=oecd&itemContentType=book)
FAIR principle
Findability, Accessibility, Interoperability, and Reusability. https://www.nature.com/articles/sdata201618
To be Findable: F1. (meta)data are assigned a globally unique and persistent identifier; F2. data are described with rich metadata (defined by R1 below); F3. metadata clearly and explicitly include the identifier of the data it describes; F4. (meta)data are registered or indexed in a searchable resource.
To be Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol; A1.1 the protocol is open, free, and universally implementable; A1.2 the protocol allows for an authentication and authorization procedure, where necessary; A2. metadata are accessible, even when the data are no longer available.
To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data.
To be Reusable: R1. meta(data) are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance R1.3. (meta)data meet domain-relevant community standards
FATE (AI)
Four interrelated characteristics of responsible AI: fairness, accountability, transparency, and explainability.
Leave a comment