Workshop on large databases in social and economic complex systems

Program and Abstracts

Wednesday, September 17th

14:10 14:50 Alan Kirman – The advantages and disadvantages of large market data bases: The case of perishable goods and financial markets.

Groupement de Recherche en Economie Quantitative d'Aix-Marseille, Marseille, France.

Abstract: By large data-bases we typically mean ones with many observations. However, in markets the size may be generated by the number of dimensions of each observation. In the case of the perishable goods markets, such as that for fish, which we have studied we have many characteristics for each transaction and for each of the problems that we have analysed we had to abandon some of these dimensions. To what extent are we conditioning our results on the choice of omitted variables? This recalls the "identification" problem raised by Manski for social interaction models. Secondly the frequency of observation may change the nature of the appropriate model. In financial markets, high frequency data is used in conjunction with "temporary market equilibrium" models. But one can ask whether this is appropriate since each observation reflects an individual transaction and not a market clearing price in any standard sense. If we are to use this sort of data appropriately we would do better to model an order book explicitly and I will suggest how this might be done.

14:50 15:30 János Kertész – Searching people's digital footprints

Institute of Physics, Budapest University of Technology and Economics, Budapest, Hungary.

Abstract: Recent development in information technology together with multidisciplinary efforts have opened a new avenue in social sciences. Instead of using the classical tools of rather limited number of questionnaires, research focuses on the "digital footprints" of people, i.e., on the electronic data we leave behind almost all our activities from communicating to working and from shopping to leisure. These data reflect the social interactions, the habits and attitudes of people and their proper evaluation gives new insight into the structure and the dynamics of the society. The use of such data is a scientific challenge and, at the same time, it raises ethical problems. Recent studies on databases of email and phone communications, are typical examples of this kind. Some data are publicly available, like the Enron email data set; others can be collected by software tools and some are difficult to obtain because of commercial interests and/or privacy issues. Results based on financial data, on the Enron email data set and on records of mobile phone conversations will be shown.

References:

Z. Eisler, I. Bartos, J. Kertész: Fluctuation scaling in complex systems: Taylor's law and beyond, Advances in Physics 57, 89-142 (2008).

J.-P. Onnela, J. Saramäki, J. Hyvonen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, A.-L. Barabási: Structure and tie strengths in mobile communication networks, PNAS 104, 7332-7336 (2007).

J.-P. Onnela, J. Saramäki, J. Hyvonen, G. Szabó, M. Argollo de Menezes, K. Kaski, A.-L. Barabási, J. Kertész, Analysis of a large-scale weighted network of one-to-one human communication, New J. Phys. 9, 179 (2007).

15:30 16:10 Robert F. Boruch – Ethics, Evidence Grading Systems, and Evidence Based Decision Making in Complex Systems Research

Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Abstract: Boruch's segment of the workshop will consider three topics that are pertinent to the conference themes. The presentation briefly considers ethics related to individual privacy in research, the ethics of scientific inquiry, and resolving the tensions between them. Evaluating the quality of evidence is construed as part of the scientific ethic, and various systems for screening and grading evidence will be considered. This covers some international, national, and state/provincial systems, and a limited focus on certain research designs. The last part of the presentation concerns practical and theoretical conditions that enhance the likelihood that evidence will be used in policy and practice contexts.

16:10 16:50 Esther Adi-Japha - Large databases vs. individual analysis: Two complimentary approaches in the study of education and learning

School of Education, Bar-Ilan University, Ramat-Gan, Israel.

Abstract: In this talk I will discuss two methods for understanding the way children learn, and the factors that affect their learning. The first method concerns the most comprehensive child care study conducted to date to determine how variations in child care are related to children's development. This large-scale study was conducted at the USA, and data is available for secondary analysis. In the first part of the talk I will review the major findings of this study and shortly describe a secondary analysis we conducted. For various reasons concerned with data assessment and data sharing procedures, this study, in similar to other large-scale studies, does not allow inferences regarding the development of individuals. However, it is becoming wildly accepted that group data may mask critical phases in the individual's development. The second method presented in this talk concerns small-scale studies that describe a simple motor-task learning. Results of several studies that extend over hours, days, and weeks of practice on this specific task, a task that has been extensively studied as a model for skill learning, suggests that learning is not a smooth, continuous process, but is rather composed of discrete phases. In the talk I will review developmental results, and their possible implications for curricular planning.

References to be used:

NICHD Early Child Care Research Network. (2006). Child-care effect sizes for the NICHD Study of Early Child Care and Youth Development. American Psychologist, 61, 99-116.

Belsky, Vandell, Burchinal, Clarke-Stewart, McCartney, Owen, and the NICHD ECCRN. (2007).Are there long-term effects of early child care?. Child Development, 78, 681 – 701.

Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995). Functional MRI evidence for adult motor cortex plasticity during motor skill-learning. Nature, 377, 155–158.

Dorfberger, S., Adi-japha, E., & Karni, A. (2007). Less selective motor memory consolidation in childhood: reduced susceptibility to interference, Plos One, 2, e240.

Adi-Japha, E., Karni, A., Parnes, A, Loewenschuss, I & Vakil, E. (2008). A shift in task routines during the learning of a motor skill: group averaged data may mask critical phases in the individuals' acquisition of skilled performance. Journal of Experimental Psychology: Learning memory and Cognition, In press.

16:50 17:20 Coffee break

17:20 18:00 Mauro Gallegati – Financially Constrained Fluctuations in an Evolving Network Economy

Dipartimento di Economia, Università Politecnica delle Marche, Ancona, Italia.

Abstract: We explore the properties of a credit network characterized by inside credit, i.e. credit relationships connecting downstream (D) and upstream (U) firms and outside credit ñi.e. Credit relationships connecting firms and banks. The structure of the network changes over time due to the preferred-partner choice rule: each agent chooses the partner who charges the lowest price. The net worth of D firms turns out to be the driver of fluctuations. U production, in fact, is determined by demand of intermediate inputs on the part of D firms and production of the latter is financially constrained, i.e. determined by the availability of internal finance proxied by net worth. The output of simulations shows that at the macroeconomic level a business cycle can develop as a consequence of the complex interaction of the agents' financial conditions.

18:00 18:40 Imre Kondor – Instability of downside risk measures

Collegium Budapest, Budapest, Hungary.

Abstract: We have recently shown that the axioms for coherent risk measures imply that whenever a dominant portfolio can be formed on a given sample (which happens with finite probability even for large samples), then portfolio optimization cannot be performed under any coherent measure on that sample, and the risk measure diverges to minus infinity. The fundamental reason for this instability is that, despite the abundance of financial data, we never have sufficient information for optimizing large portfolios. Here we extend this result and demonstrate that this instability is present in an even larger class of risk measures, including the most popular measure Value at Risk. An exact replica calculation allows us to determine the phase boundary where the instability of VaR sets in and where the estimation error diverges. The reason for why this instability has not been noticed before is also discussed.

Thursday September 18th

14:10 14:50 Stefan Bornholdt – Physics of complex networks: Applications in large online markets

Institute for Theoretical Physics, University of Bremen, Bremen, Germany.

Abstract: The new emerging internet phenomena as online markets and social networking sites provide an exciting new possibility for detailed studies of large socio-economic systems. They form large complex networks of interactions which call for new methods for their analysis. During recent years, physicists have started to fill this gap by developing methods and tools. I will give a short overview of some of these tools and their scope of application. The focus will lie on methods of community detection and their application to the complex networks of large online markets. One large case study will be presented, firstly to demonstrate the power and limits of analysis of large online markets with physical methods and, secondly, to discuss data issues in this context, from collection and availability, to their analysis and interpretation.

14:50 15:30 Byungnam Kahng – Quantifying the Complete Trajectory of the Coauthorship Network Evolution

Department of Physics and Astronomy, Seoul National University, Seoul, Korea.

co-authors: Deokjae Lee, K.-I. Goh and D. Kim1

Abstract: We collect empirical datasets to study the evolution of complex networks. The datasets include the coauthorship relations of scientists working on the subject of complex network and string theory, respectively, and the word collocation data from infants. The dataset for a coauthorship network of scientists working on the complex network is over 115 months, and most importantly, spans from the initial point of its evolution till today. The other datasets are also available from the beginning of the evolution, providing us with the unique opportunity to study the complete trajectory of complex network evolution from the seed. Based on the statistics of evolution rates of various types of edges obtained from the empirical data, we find that growth of largest cluster of the coauthorship network is made by the continuous aggregation with finite clusters during the whole period of evolution. For the word collocation network of infants, however, the largest cluster grows incrementally but dominantly, without developing finite clusters. The largest clusters of the coauthorship networks form tree-like structures in the early stage and large-scale loop structures follow in later time. The largest cluster of the word collocation network, on the contrary, rapidly evolves into a dense cluster with excessive links and ultra-short diameter. We can detect the transition from the tree-like structure to a large-scale loop structure by measuring the fluctuation in the diameter of the giant cluster. In tree-like structures, the diameter of the giant cluster can be curtailed abruptly even by emerging of only one link between nodes in long distance and it can be expanded abruptly by cluster merging. Thus, although the number of nodes in the giant cluster of the coauthorship networks grows in a relatively gradual manner, the fluctuation in the largest-cluster diameter is significant. Following such empirical results, we construct a simple model for the coauthorship-type networks to pinpoint the origins and major driving forces of the observed non-trivial evolution pattern. The model has a key ingredient of the so-called locality constraint, motivated by the empirical finding that most of the new links in the real networks are created between nodes in short distances and new links which connect nodes in long distance are created rarely. Imposing the locality constraint, the model reproduces the observed evolution pattern of the coauthorship networks.

15:30 16:10 Fabrizio Lillo – The evolution of high frequency financial databases: from daily data to agent resolved data

Dipartimento di Fisica e Tecnologie Relative, Università di Palermo, Palermo, Italia; Santa Fe Institute, Santa Fe, NM, USA.

Abstract: I will sketch the historical development of high frequency financial databases from daily (or even quarterly) data to databases with an increasing level of resolution. I will focus on two detailed types of databases, specifically order book data and agent resolved data. In the first case I will present some recent results on the microstructure of financial markets and on the insight one obtains on the price formation process. The agent resolved datasets may contains information on the trading activity of brokers, accounts, or traders. I will present some recent results obtained with this type of data which allows to classify agents according to their trading strategy, to study their interaction, and the effect of agents' activity on price formation.

16:10 16:40 Coffee break

16:40 18:20 Round table on the use of large databases in complex system research of social and economic complex systems.

Back